Yes, currently graylog2 will cache messages in the heap, this is a known 
shortcoming.
You will have to stop graylog2 in these cases, we really rely on 
elasticsearch being available and fast enough.

We will eventually add a disk or message bus queue mechanism, but because 
of the added complexity and more pressing things to work on/fix we have 
postponed that.

Best,
Kay

On Tuesday, April 22, 2014 5:04:47 PM UTC+2, Stephane Boisvert wrote:
>
> Hi,
>   I'm currently experiencing problems with graylog2-server with only one 
> radio input when the elasticsearch cluster is too slow or unavailable .. It 
> seems to fill up the HEAP and then it crash with 100% CPU usage on all 
> cores..  Also all the zookeeper client connections are timing out after the 
> crash....
>
>
> When elastic search is  rebalancing or recovering (yellow cluster) it 
> seems that the graylog2-server with the radio input is accumulating 
> messages (in the heap I think) and when it's full (the yellow bar is going 
> to max ) then crash.
>  
>
> here is the important info:
>
> 5 elasticsearch nodes with 384GB RAM,  128G set for heap...
>
>
> 1 graylog2-server node without any input (master)
> 1 graylog2-server node with radio input  (the one crashing)
>
> I tried with 5G heap and with 200G heap  (plus other random values)  all 
> with the same results
>
>
> I tried with 2 graylog2-server with radio input on both... same results...
>
>
>
> here is my graylog2-server settings
>
>
> elasticsearch_shards = 12
> elasticsearch_replicas = 5
> output_batch_size = 1000
>
> # The number of parallel running processors.
> # Raise this number if your buffers are filling up.
> processbuffer_processors = 10
> outputbuffer_processors =  5
>
> processor_wait_strategy = blocking
>
> # Size of internal ring buffers. Raise this if raising 
> outputbuffer_processors does not help anymore.
> # For optimum performance your LogMessage objects in the ring buffer 
> should fit in your CPU L3 cache.
> # Start server with --statistics flag to see buffer utilization.
> # Must be a power of 2. (512, 1024, 2048, ...)
> ring_size = 1024
>
>
> I would like some advices as this system is almost ready to go prod .. but 
> we have this show stopper issue...
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to