2016-03-25T20:47:35.346Z WARN [IndexHelper] Couldn't find latest deflector target index org.graylog2.database.NotFoundException: Index range for index <graylog2_185> not found.
I cannot reach the 'indices' page for further information, so I attempted to manually cycle via an API call. At which point I get: 2016-03-25T20:47:37.985Z ERROR [IndexRotationThread] Couldn't point deflector to a new index org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task. Which makes it seem like an ElasticSearch problem, so I check the ElasticSearch status on the Graylog web UI and it says Green. Just in case it's lying... [root@graylog graylog-server]# curl -XGET 'http://10.200.2.120:9200/_cluster/health?pretty=true' { "cluster_name" : "graylog-production", "status" : "green", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 3, "active_primary_shards" : 561, "active_shards" : 562, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0 } So I stopped the graylog server, stopped all three ElasticSearch nodes, restarted all three ElasticSearch nodes and waited for them to re-form their constellation and go green again. Then I started the graylog server again. It promptly started processing data again then about 15 minutes later quit processing again (started shoveling everything into mongo and taking nothing out) and the logs show the same problem with it not finding the latest deflector target. Meanwhile, querying ElasticSearch via the curl API still works fine... so I seriously doubt it's ElasticSearch. There's a *reason* why I gave ElasticSearch three nodes with a redundancy level set to 2 -- I initially ran into problems with ElasticSearch not being reliable, so I gave it double redundancy across three nodes so that no matter what, the data should remain available. At that point ElasticSearch ceased to be a reliability issue. Frankly, I am pretty discouraged right now. My Nagios logs show that the longest that graylog has remained up and running in the past month has been 7 hours before it became unresponsive and had to be restarted. Then this random cessation of processing data where the server is still responsive but just won't save into ElasticSearch. Granted, I'm throwing a fair amount of data at Graylog, around 1200 messages per second, but Splunk wasn't even breathing hard at that load even though Splunk was running on 1/3rd the hardware. Unless I can figure out why graylog is ridiculously incapable of handling the load without falling over, I guess I'll have to accept it's worth what I paid for it (i.e., nothing) and find some other solution, sigh... -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/429dd7e0-95f4-4112-b3aa-b9d03a6735d3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
