[graylog2] Re: Unstable graylog2 cluster in highload environments.

Asad Mehmood Thu, 18 Sep 2014 02:06:21 -0700

Martin thanks a lot for your reply and very sorry for getting back this 
late.



actually I am using almost similar configuration 
my regular data loads are ~8000 Mps and in peak times can go to 30,000. 
You have very good settings for high load environment as almost similar 
settings are used by my cluster and it is quite stable now.

However, the problem I am facing now after a lot of tuning is 
graylog2-servers heap overflow, I am working on a customization of server 
to stop reading messages from Kafka if heap is 80% and starts reading again 
later.

I think it's because of the structure of log message my cluster is 
processing, sometimes they are huge with Java stack traces.
I used 
output_batch_size = 30000 
but then i decided to reduce it 5000 to write more often, maybe i 
misunderstood this setting.
and about the processbuffer and outputbuffer I read somewhere that is is 
better to have

outputbuffer_processors = n/2 of processbuffer_processors.

and yes about the shards I think it will be better if use two shards 
because 4 shards mean 2 shards / node for each index so the query cost will 
increase.

But I will try 
timeout.DEFAULT=60s  

I didn't try this setting yet.

thanks a lot for your help.

Asad

On Thursday, 12 June 2014 18:01:27 UTC+9, Martin René Mortensen wrote:
>
> Hi Asad,
>
> Im running a graylog2 0.20.2 setup with ~5000 msgs/s and peaks around 
> 10000 msgs/s. It can be tricky to setup, especially if you also want to be 
> able to search through it all with decent response times.
>
> I found that increasing the number of elasticsearch nodes helped immensely 
> with both indexing and search performance, as if elasticsearch just likes 
> more nodes.
>
> This is my setup:
>
> 2 8vcpu elasticsearch 0.90.10 nodes
> 1 5vcpu graylog2-server 0.20.2 node with udp syslog input
> 1 1vcpu graylog2-web 0.20.2 node
>
> I use following tunings in /etc/elasticsearch/elasticsearch.conf:
>
> index.translog.flush_threshold_ops: 50000
> index.refresh_interval: 15s
>
> #index.cache.field.type: soft
> index.cache.field.max_size: 10000
> threadpool.bulk.queue_size: 500
>
>
>
> I use following settings in /etc/graylog2/server.conf:
>
> elasticsearch_shards = 4
> elasticsearch_replicas = 0
>
> elasticsearch_analyzer = standard
> output_batch_size = 60000
> processbuffer_processors = 40
> outputbuffer_processors = 60
> processor_wait_strategy = blocking
> ring_size = 8192
>
> and for /etc/graylog2/web.conf on web node:
>
> # Higher time-out to avoid failures
> timeout.DEFAULT=60s
>
>
> Im not sure how much it can take, but we have peaks at >10.000 msgs/s. I 
> also have alot of custom drools rules on my graylog2 instance making field 
> extractions of all the cisco asa and ace logs into , which uses alot of the 
> CPU on that node.
>
> Hope this helps pointing you in the right direction.
>
> /Martin
>
> On Wednesday, 11 June 2014 10:44:12 UTC+2, Arie wrote:
>>
>> Hi Asad,
>>
>> Searching around I found a very fine article about Graylog2 with 
>> Elasticsearch, maybe there is some info
>> in it to help you out. I am trying to build my own Elasticsearch cluster 
>> here.
>>
>> http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html
>>
>>
>> Arie.
>>
>>
>> On Monday, June 9, 2014 2:37:52 AM UTC+2, Asad Mehmood wrote:
>>>
>>> Good day!
>>>
>>> Recently I started implementing log monitoring and analysis system using 
>>> graylog2, we will have around 12,000 message / second. Though in staging we 
>>> are not even near that number but the cluster is not stable.
>>>
>>> Sometimes ES discovery fails because either the PC is in I/O wait or 
>>> there are too many processes in each core. 
>>> I tried to tune the settings by one way or another the cluster finds a 
>>> way to fail, as for my setup there are some limitation for a a while to use 
>>> high speed I/O so I need to either stick with slow disks or divide the 
>>> setup in a way that recent logs remain in high speed disks and older are 
>>> moved to low performance cluster. I was hoping if someone can help me 
>>> formulate or calculate a formula to decide how many nodes I need for ES 
>>> cluster, graylog2-server, radio and Kafka.
>>>
>>> There is another problem with KAFKA input if i shutdown Kafka, zookeeper 
>>> or radio, the messages stop coming and I need to Terminate Kafka input and 
>>> Launch a new input.
>>> Also the message throughput while using KAFKA and Radio is far less than 
>>> using direct inputs with graylog2-benchmark tool.
>>>
>>> Current Setup
>>> 2 Nodes for Log Collector and Radio  (8 Gb, 2 Core Xeon )
>>> 1. Graylog2-server + graylog2-web (16 Gb, 4 Core Xeon )
>>> 1. Graylog2-server + elasticsearch (16 Gb, 4 Core Xeon )
>>> 3. Elasticsearch + Kafka Node (16 Gb, 4 Core Xeon )
>>>
>>> The message throughput in peak hours will be 12000 / second and to 
>>> implement this system in  production, the system needs to withstand stress 
>>> test of 20.000 message / second. 
>>>
>>> I will appreciate if anyone here can help me with formulating the 
>>> performance requirements by quantifying them.
>>>
>>>
>>> regards,
>>>
>>> Asad
>>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: Unstable graylog2 cluster in highload environments.

Reply via email to