[graylog2] Re: Unstable graylog2 cluster in highload environments.

Martin René Mortensen Fri, 24 Oct 2014 02:16:27 -0700

I still have a crash once in a while - when some attack or probe hits our 
firewalls I guess. Rebooting the elasticsearch nodes seem to clear this up.


I might have overcommited my memory, as I am running with 24G heap on 32G 
machines - the recommandation is 50% of server memory should be the heap 
size, but with less heap, elasticsearch crashes because of missing heap 
once in a while.

I hope graylog2 0.91 will fix this by putting overflowing message into 
temporary disk storage instead of filling up memory until it dies, but Im 
not sure it can ever catch it up.

Brgds. Martin

On Thursday, 18 September 2014 11:06:09 UTC+2, Asad Mehmood wrote:
>
> Martin thanks a lot for your reply and very sorry for getting back this 
> late. 
>
>
> actually I am using almost similar configuration 
> my regular data loads are ~8000 Mps and in peak times can go to 30,000. 
> You have very good settings for high load environment as almost similar 
> settings are used by my cluster and it is quite stable now.
>
> However, the problem I am facing now after a lot of tuning is 
> graylog2-servers heap overflow, I am working on a customization of server 
> to stop reading messages from Kafka if heap is 80% and starts reading again 
> later.
>
> I think it's because of the structure of log message my cluster is 
> processing, sometimes they are huge with Java stack traces.
> I used 
> output_batch_size = 30000 
> but then i decided to reduce it 5000 to write more often, maybe i 
> misunderstood this setting.
> and about the processbuffer and outputbuffer I read somewhere that is is 
> better to have
>
> outputbuffer_processors = n/2 of processbuffer_processors.
>
> and yes about the shards I think it will be better if use two shards 
> because 4 shards mean 2 shards / node for each index so the query cost will 
> increase.
>
> But I will try 
> timeout.DEFAULT=60s  
>
> I didn't try this setting yet.
>
> thanks a lot for your help.
>
> Asad
>
> On Thursday, 12 June 2014 18:01:27 UTC+9, Martin René Mortensen wrote:
>>
>> Hi Asad,
>>
>> Im running a graylog2 0.20.2 setup with ~5000 msgs/s and peaks around 
>> 10000 msgs/s. It can be tricky to setup, especially if you also want to be 
>> able to search through it all with decent response times.
>>
>> I found that increasing the number of elasticsearch nodes helped 
>> immensely with both indexing and search performance, as if elasticsearch 
>> just likes more nodes.
>>
>> This is my setup:
>>
>> 2 8vcpu elasticsearch 0.90.10 nodes
>> 1 5vcpu graylog2-server 0.20.2 node with udp syslog input
>> 1 1vcpu graylog2-web 0.20.2 node
>>
>> I use following tunings in /etc/elasticsearch/elasticsearch.conf:
>>
>> index.translog.flush_threshold_ops: 50000
>> index.refresh_interval: 15s
>>
>> #index.cache.field.type: soft
>> index.cache.field.max_size: 10000
>> threadpool.bulk.queue_size: 500
>>
>>
>>
>> I use following settings in /etc/graylog2/server.conf:
>>
>> elasticsearch_shards = 4
>> elasticsearch_replicas = 0
>>
>> elasticsearch_analyzer = standard
>> output_batch_size = 60000
>> processbuffer_processors = 40
>> outputbuffer_processors = 60
>> processor_wait_strategy = blocking
>> ring_size = 8192
>>
>> and for /etc/graylog2/web.conf on web node:
>>
>> # Higher time-out to avoid failures
>> timeout.DEFAULT=60s
>>
>>
>> Im not sure how much it can take, but we have peaks at >10.000 msgs/s. I 
>> also have alot of custom drools rules on my graylog2 instance making field 
>> extractions of all the cisco asa and ace logs into , which uses alot of the 
>> CPU on that node.
>>
>> Hope this helps pointing you in the right direction.
>>
>> /Martin
>>
>> On Wednesday, 11 June 2014 10:44:12 UTC+2, Arie wrote:
>>>
>>> Hi Asad,
>>>
>>> Searching around I found a very fine article about Graylog2 with 
>>> Elasticsearch, maybe there is some info
>>> in it to help you out. I am trying to build my own Elasticsearch cluster 
>>> here.
>>>
>>> http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html
>>>
>>>
>>> Arie.
>>>
>>>
>>> On Monday, June 9, 2014 2:37:52 AM UTC+2, Asad Mehmood wrote:
>>>>
>>>> Good day!
>>>>
>>>> Recently I started implementing log monitoring and analysis system 
>>>> using graylog2, we will have around 12,000 message / second. Though in 
>>>> staging we are not even near that number but the cluster is not stable.
>>>>
>>>> Sometimes ES discovery fails because either the PC is in I/O wait or 
>>>> there are too many processes in each core. 
>>>> I tried to tune the settings by one way or another the cluster finds a 
>>>> way to fail, as for my setup there are some limitation for a a while to 
>>>> use 
>>>> high speed I/O so I need to either stick with slow disks or divide the 
>>>> setup in a way that recent logs remain in high speed disks and older are 
>>>> moved to low performance cluster. I was hoping if someone can help me 
>>>> formulate or calculate a formula to decide how many nodes I need for ES 
>>>> cluster, graylog2-server, radio and Kafka.
>>>>
>>>> There is another problem with KAFKA input if i shutdown Kafka, 
>>>> zookeeper or radio, the messages stop coming and I need to Terminate Kafka 
>>>> input and Launch a new input.
>>>> Also the message throughput while using KAFKA and Radio is far less 
>>>> than using direct inputs with graylog2-benchmark tool.
>>>>
>>>> Current Setup
>>>> 2 Nodes for Log Collector and Radio  (8 Gb, 2 Core Xeon )
>>>> 1. Graylog2-server + graylog2-web (16 Gb, 4 Core Xeon )
>>>> 1. Graylog2-server + elasticsearch (16 Gb, 4 Core Xeon )
>>>> 3. Elasticsearch + Kafka Node (16 Gb, 4 Core Xeon )
>>>>
>>>> The message throughput in peak hours will be 12000 / second and to 
>>>> implement this system in  production, the system needs to withstand stress 
>>>> test of 20.000 message / second. 
>>>>
>>>> I will appreciate if anyone here can help me with formulating the 
>>>> performance requirements by quantifying them.
>>>>
>>>>
>>>> regards,
>>>>
>>>> Asad
>>>>
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: Unstable graylog2 cluster in highload environments.

Reply via email to