Yes, currently graylog2 will cache messages in the heap, this is a known shortcoming. You will have to stop graylog2 in these cases, we really rely on elasticsearch being available and fast enough.
We will eventually add a disk or message bus queue mechanism, but because of the added complexity and more pressing things to work on/fix we have postponed that. Best, Kay On Tuesday, April 22, 2014 5:04:47 PM UTC+2, Stephane Boisvert wrote: > > Hi, > I'm currently experiencing problems with graylog2-server with only one > radio input when the elasticsearch cluster is too slow or unavailable .. It > seems to fill up the HEAP and then it crash with 100% CPU usage on all > cores.. Also all the zookeeper client connections are timing out after the > crash.... > > > When elastic search is rebalancing or recovering (yellow cluster) it > seems that the graylog2-server with the radio input is accumulating > messages (in the heap I think) and when it's full (the yellow bar is going > to max ) then crash. > > > here is the important info: > > 5 elasticsearch nodes with 384GB RAM, 128G set for heap... > > > 1 graylog2-server node without any input (master) > 1 graylog2-server node with radio input (the one crashing) > > I tried with 5G heap and with 200G heap (plus other random values) all > with the same results > > > I tried with 2 graylog2-server with radio input on both... same results... > > > > here is my graylog2-server settings > > > elasticsearch_shards = 12 > elasticsearch_replicas = 5 > output_batch_size = 1000 > > # The number of parallel running processors. > # Raise this number if your buffers are filling up. > processbuffer_processors = 10 > outputbuffer_processors = 5 > > processor_wait_strategy = blocking > > # Size of internal ring buffers. Raise this if raising > outputbuffer_processors does not help anymore. > # For optimum performance your LogMessage objects in the ring buffer > should fit in your CPU L3 cache. > # Start server with --statistics flag to see buffer utilization. > # Must be a power of 2. (512, 1024, 2048, ...) > ring_size = 1024 > > > I would like some advices as this system is almost ready to go prod .. but > we have this show stopper issue... > > > > > > -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
