The indexer failures section is a welcome addition - as I'm greeted with quite a few thousand (upwards of 50K after about 30 minutes) of these messages:
RemoteTransportException[[Suicide][inet[/1[ipaddress]:9300]][bulk/shard]]; > nested: EsRejectedExecutionException[rejected execution (queue capacity 50) > on > org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@2b5b09fb]; At any given time, my Graylog2 box is processing anywhere from 2,500 - 5,500 messages/sec, with occasional spikes of 7K/sec. Right now I have 3.4 billion messages, totaling to about 1.8TB. I increased my shard count from 4 to 25, restarted, and cycled the deflector: That didn't seem to help. I found a thread speaking of thread count and queue size increases and decided to try that: http://elasticsearch-users.115913.n3.nabble.com/Understanding-Threadpools-td4028445.html So here's my custom elasticsearch performance vars out of my configuration (NOT in graylog2's configuration) (Some of these are not really needed, but I have so much memory to work with it doesn't matter): indices.memory.index_buffer_size: 30% > indices.memory.min_shard_index_buffer_size: 12mb > indices.memory.min_index_buffer_size: 96mb > index.refresh_interval: 30s > index.translog.flush_threshold_ops: 5000 > threadpool.bulk.queue_size: 500 The relevant change to stop the rejections was increasing my threadpool bulk queue_size to 500. The default is 50. I was still getting an occasional queue-full rejection at 200. I could set to -1 to have it unbounded, but I feel like that's not a good practice. The bulk tasks seem to complete within milliseconds, but there are enough of them being instantiated at the same time for it to to fill up the queue when 50 is simply too small. After about 15 minutes, here are my cluster stats where I happened to capture some active bulk queues. After a 1/4 second, the queues are empty again: > http://ipaddress:9200/_nodes/thread_pool/stats?pretty=true > > { > "cluster_name" : "graylog2", > "nodes" : { > "c-9rpgQTQI68r91PicxmzA" : { > "timestamp" : 1393351331175, > "name" : "graylog2-server", > "transport_address" : "inet[/XXXXXXXX]", > "hostname" : "XXXXXXXXXX", > "attributes" : { > "client" : "true", > "data" : "false", > "master" : "false" > }, > "thread_pool" : { > "generic" : { > "threads" : 1, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 4, > "completed" : 258 > }, > "index" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "get" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "snapshot" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "merge" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "suggest" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "bulk" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "optimize" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "warmer" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "flush" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "search" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "percolate" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "management" : { > "threads" : 2, > "queue" : 0, > "active" : 1, > "rejected" : 0, > "largest" : 2, > "completed" : 675 > }, > "refresh" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > } > } > }, > "LIVHf3iGSWuRnOjaA74UPA" : { > "timestamp" : 1393351331174, > "name" : "Drake, Frank", > "transport_address" : "inet[/XXXXXXXXX]", > "hostname" : "XXXXXXXXX", > "attributes" : { > "master" : "true" > }, > "thread_pool" : { > "generic" : { > "threads" : 4, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 7, > "completed" : 3142 > }, > "index" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "get" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "snapshot" : { > "threads" : 5, > "queue" : 3, > "active" : 5, > "rejected" : 0, > "largest" : 5, > "completed" : 3452 > }, > "merge" : { > "threads" : 5, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 5, > "completed" : 17338 > }, > "suggest" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "bulk" : { > "threads" : 16, > "queue" : 123, > "active" : 15, > "rejected" : 0, > "largest" : 16, > "completed" : 1649520 > }, > "optimize" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "warmer" : { > "threads" : 4, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 4, > "completed" : 3972 > }, > "flush" : { > "threads" : 3, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 3, > "completed" : 325 > }, > "search" : { > "threads" : 48, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 48, > "completed" : 8355 > }, > "percolate" : { > "threads" : 0, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 0, > "completed" : 0 > }, > "management" : { > "threads" : 5, > "queue" : 0, > "active" : 1, > "rejected" : 0, > "largest" : 5, > "completed" : 413367 > }, > "refresh" : { > "threads" : 3, > "queue" : 0, > "active" : 0, > "rejected" : 0, > "largest" : 3, > "completed" : 575 > } > } > } > } > } > > Great feature. I was apparently losing messages because of an un-tuned elasticsearch, and I didn't even know it until this revealed the problem. -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
