Now, how do I clear this indexer failure log? On Tuesday, February 25, 2014 1:14:27 PM UTC-5, Scotty H wrote: > > The indexer failures section is a welcome addition - as I'm greeted with > quite a few thousand (upwards of 50K after about 30 minutes) of these > messages: > > RemoteTransportException[[Suicide][inet[/1[ipaddress]:9300]][bulk/shard]]; >> nested: EsRejectedExecutionException[rejected execution (queue capacity 50) >> on >> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@2b5b09fb]; > > > At any given time, my Graylog2 box is processing anywhere from 2,500 - > 5,500 messages/sec, with occasional spikes of 7K/sec. Right now I have 3.4 > billion messages, totaling to about 1.8TB. > I increased my shard count from 4 to 25, restarted, and cycled the > deflector: That didn't seem to help. I found a thread speaking of thread > count and queue size increases and decided to try that: > > http://elasticsearch-users.115913.n3.nabble.com/Understanding-Threadpools-td4028445.html > > So here's my custom elasticsearch performance vars out of my configuration > (NOT in graylog2's configuration) (Some of these are not really needed, but > I have so much memory to work with it doesn't matter): > > indices.memory.index_buffer_size: 30% >> indices.memory.min_shard_index_buffer_size: 12mb >> indices.memory.min_index_buffer_size: 96mb >> index.refresh_interval: 30s >> index.translog.flush_threshold_ops: 5000 >> threadpool.bulk.queue_size: 500 > > > > The relevant change to stop the rejections was increasing my threadpool > bulk queue_size to 500. The default is 50. I was still getting an > occasional queue-full rejection at 200. I could set to -1 to have it > unbounded, but I feel like that's not a good practice. The bulk tasks seem > to complete within milliseconds, but there are enough of them being > instantiated at the same time for it to to fill up the queue when 50 is > simply too small. > > After about 15 minutes, here are my cluster stats where I happened to > capture some active bulk queues. After a 1/4 second, the queues are empty > again: > > > > >> http://ipaddress:9200/_nodes/thread_pool/stats?pretty=true >> >> { >> "cluster_name" : "graylog2", >> "nodes" : { >> "c-9rpgQTQI68r91PicxmzA" : { >> "timestamp" : 1393351331175, >> "name" : "graylog2-server", >> "transport_address" : "inet[/XXXXXXXX]", >> "hostname" : "XXXXXXXXXX", >> "attributes" : { >> "client" : "true", >> "data" : "false", >> "master" : "false" >> }, >> "thread_pool" : { >> "generic" : { >> "threads" : 1, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 4, >> "completed" : 258 >> }, >> "index" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "get" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "snapshot" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "merge" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "suggest" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "bulk" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "optimize" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "warmer" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "flush" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "search" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "percolate" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "management" : { >> "threads" : 2, >> "queue" : 0, >> "active" : 1, >> "rejected" : 0, >> "largest" : 2, >> "completed" : 675 >> }, >> "refresh" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> } >> } >> }, >> "LIVHf3iGSWuRnOjaA74UPA" : { >> "timestamp" : 1393351331174, >> "name" : "Drake, Frank", >> "transport_address" : "inet[/XXXXXXXXX]", >> "hostname" : "XXXXXXXXX", >> "attributes" : { >> "master" : "true" >> }, >> "thread_pool" : { >> "generic" : { >> "threads" : 4, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 7, >> "completed" : 3142 >> }, >> "index" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "get" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "snapshot" : { >> "threads" : 5, >> "queue" : 3, >> "active" : 5, >> "rejected" : 0, >> "largest" : 5, >> "completed" : 3452 >> }, >> "merge" : { >> "threads" : 5, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 5, >> "completed" : 17338 >> }, >> "suggest" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "bulk" : { >> "threads" : 16, >> "queue" : 123, >> "active" : 15, >> "rejected" : 0, >> "largest" : 16, >> "completed" : 1649520 >> }, >> "optimize" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "warmer" : { >> "threads" : 4, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 4, >> "completed" : 3972 >> }, >> "flush" : { >> "threads" : 3, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 3, >> "completed" : 325 >> }, >> "search" : { >> "threads" : 48, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 48, >> "completed" : 8355 >> }, >> "percolate" : { >> "threads" : 0, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 0, >> "completed" : 0 >> }, >> "management" : { >> "threads" : 5, >> "queue" : 0, >> "active" : 1, >> "rejected" : 0, >> "largest" : 5, >> "completed" : 413367 >> }, >> "refresh" : { >> "threads" : 3, >> "queue" : 0, >> "active" : 0, >> "rejected" : 0, >> "largest" : 3, >> "completed" : 575 >> } >> } >> } >> } >> } >> >> > Great feature. I was apparently losing messages because of an un-tuned > elasticsearch, and I didn't even know it until this revealed the problem. > >
-- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
