Hi, I'm currently experiencing problems with graylog2-server with only one radio input when the elasticsearch cluster is too slow or unavailable .. It seems to fill up the HEAP and then it crash with 100% CPU usage on all cores.. Also all the zookeeper client connections are timing out after the crash....
When elastic search is rebalancing or recovering (yellow cluster) it seems that the graylog2-server with the radio input is accumulating messages (in the heap I think) and when it's full (the yellow bar is going to max ) then crash. here is the important info: 5 elasticsearch nodes with 384GB RAM, 128G set for heap... 1 graylog2-server node without any input (master) 1 graylog2-server node with radio input (the one crashing) I tried with 5G heap and with 200G heap (plus other random values) all with the same results I tried with 2 graylog2-server with radio input on both... same results... here is my graylog2-server settings elasticsearch_shards = 12 elasticsearch_replicas = 5 output_batch_size = 1000 # The number of parallel running processors. # Raise this number if your buffers are filling up. processbuffer_processors = 10 outputbuffer_processors = 5 processor_wait_strategy = blocking # Size of internal ring buffers. Raise this if raising outputbuffer_processors does not help anymore. # For optimum performance your LogMessage objects in the ring buffer should fit in your CPU L3 cache. # Start server with --statistics flag to see buffer utilization. # Must be a power of 2. (512, 1024, 2048, ...) ring_size = 1024 I would like some advices as this system is almost ready to go prod .. but we have this show stopper issue... -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
