[graylog2] Re: Unstable graylog2 cluster in highload environments.

Martin René Mortensen Fri, 13 Jun 2014 05:14:41 -0700

Yes, 2 should probably be fine, I have had 4 elasticsearch nodes, but went 
back to 1, then to 2.


On Friday, 13 June 2014 13:04:47 UTC+2, Arie wrote:
>
> Hi Martin,
>
> Why SHARDS=4,
>
> while my guess with two nodes there should be 2 configured if I would 
> follow the
> directions by graylog?
>
>
>
> On Thursday, June 12, 2014 11:01:27 AM UTC+2, Martin René Mortensen wrote:
>>
>> Hi Asad,
>>
>> Im running a graylog2 0.20.2 setup with ~5000 msgs/s and peaks around 
>> 10000 msgs/s. It can be tricky to setup, especially if you also want to be 
>> able to search through it all with decent response times.
>>
>> I found that increasing the number of elasticsearch nodes helped 
>> immensely with both indexing and search performance, as if elasticsearch 
>> just likes more nodes.
>>
>> This is my setup:
>>
>> 2 8vcpu elasticsearch 0.90.10 nodes
>> 1 5vcpu graylog2-server 0.20.2 node with udp syslog input
>> 1 1vcpu graylog2-web 0.20.2 node
>>
>> I use following tunings in /etc/elasticsearch/elasticsearch.conf:
>>
>> index.translog.flush_threshold_ops: 50000
>> index.refresh_interval: 15s
>>
>> #index.cache.field.type: soft
>> index.cache.field.max_size: 10000
>> threadpool.bulk.queue_size: 500
>>
>>
>>
>> I use following settings in /etc/graylog2/server.conf:
>>
>> elasticsearch_shards = 4
>> elasticsearch_replicas = 0
>>
>> elasticsearch_analyzer = standard
>> output_batch_size = 60000
>> processbuffer_processors = 40
>> outputbuffer_processors = 60
>> processor_wait_strategy = blocking
>> ring_size = 8192
>>
>> and for /etc/graylog2/web.conf on web node:
>>
>> # Higher time-out to avoid failures
>> timeout.DEFAULT=60s
>>
>>
>> Im not sure how much it can take, but we have peaks at >10.000 msgs/s. I 
>> also have alot of custom drools rules on my graylog2 instance making field 
>> extractions of all the cisco asa and ace logs into , which uses alot of the 
>> CPU on that node.
>>
>> Hope this helps pointing you in the right direction.
>>
>> /Martin
>>
>> On Wednesday, 11 June 2014 10:44:12 UTC+2, Arie wrote:
>>>
>>> Hi Asad,
>>>
>>> Searching around I found a very fine article about Graylog2 with 
>>> Elasticsearch, maybe there is some info
>>> in it to help you out. I am trying to build my own Elasticsearch cluster 
>>> here.
>>>
>>> http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html
>>>
>>>
>>> Arie.
>>>
>>>
>>> On Monday, June 9, 2014 2:37:52 AM UTC+2, Asad Mehmood wrote:
>>>>
>>>> Good day!
>>>>
>>>> Recently I started implementing log monitoring and analysis system 
>>>> using graylog2, we will have around 12,000 message / second. Though in 
>>>> staging we are not even near that number but the cluster is not stable.
>>>>
>>>> Sometimes ES discovery fails because either the PC is in I/O wait or 
>>>> there are too many processes in each core. 
>>>> I tried to tune the settings by one way or another the cluster finds a 
>>>> way to fail, as for my setup there are some limitation for a a while to 
>>>> use 
>>>> high speed I/O so I need to either stick with slow disks or divide the 
>>>> setup in a way that recent logs remain in high speed disks and older are 
>>>> moved to low performance cluster. I was hoping if someone can help me 
>>>> formulate or calculate a formula to decide how many nodes I need for ES 
>>>> cluster, graylog2-server, radio and Kafka.
>>>>
>>>> There is another problem with KAFKA input if i shutdown Kafka, 
>>>> zookeeper or radio, the messages stop coming and I need to Terminate Kafka 
>>>> input and Launch a new input.
>>>> Also the message throughput while using KAFKA and Radio is far less 
>>>> than using direct inputs with graylog2-benchmark tool.
>>>>
>>>> Current Setup
>>>> 2 Nodes for Log Collector and Radio  (8 Gb, 2 Core Xeon )
>>>> 1. Graylog2-server + graylog2-web (16 Gb, 4 Core Xeon )
>>>> 1. Graylog2-server + elasticsearch (16 Gb, 4 Core Xeon )
>>>> 3. Elasticsearch + Kafka Node (16 Gb, 4 Core Xeon )
>>>>
>>>> The message throughput in peak hours will be 12000 / second and to 
>>>> implement this system in  production, the system needs to withstand stress 
>>>> test of 20.000 message / second. 
>>>>
>>>> I will appreciate if anyone here can help me with formulating the 
>>>> performance requirements by quantifying them.
>>>>
>>>>
>>>> regards,
>>>>
>>>> Asad
>>>>
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: Unstable graylog2 cluster in highload environments.

Reply via email to