Hi,
  I'm currently experiencing problems with graylog2-server with only one 
radio input when the elasticsearch cluster is too slow or unavailable .. It 
seems to fill up the HEAP and then it crash with 100% CPU usage on all 
cores..  Also all the zookeeper client connections are timing out after the 
crash....


When elastic search is  rebalancing or recovering (yellow cluster) it seems 
that the graylog2-server with the radio input is accumulating messages (in 
the heap I think) and when it's full (the yellow bar is going to max ) then 
crash.
 

here is the important info:

5 elasticsearch nodes with 384GB RAM,  128G set for heap...


1 graylog2-server node without any input (master)
1 graylog2-server node with radio input  (the one crashing)

I tried with 5G heap and with 200G heap  (plus other random values)  all 
with the same results


I tried with 2 graylog2-server with radio input on both... same results...



here is my graylog2-server settings


elasticsearch_shards = 12
elasticsearch_replicas = 5
output_batch_size = 1000

# The number of parallel running processors.
# Raise this number if your buffers are filling up.
processbuffer_processors = 10
outputbuffer_processors =  5

processor_wait_strategy = blocking

# Size of internal ring buffers. Raise this if raising 
outputbuffer_processors does not help anymore.
# For optimum performance your LogMessage objects in the ring buffer should 
fit in your CPU L3 cache.
# Start server with --statistics flag to see buffer utilization.
# Must be a power of 2. (512, 1024, 2048, ...)
ring_size = 1024


I would like some advices as this system is almost ready to go prod .. but 
we have this show stopper issue...





-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to