2016-03-25T20:47:35.346Z WARN  [IndexHelper] Couldn't find latest deflector 
target index
org.graylog2.database.NotFoundException: Index range for index 
<graylog2_185> not found.

I cannot reach the 'indices' page for further information, so I attempted 
to manually cycle via an API call. At which point I get:

2016-03-25T20:47:37.985Z ERROR [IndexRotationThread] Couldn't point 
deflector to a new index
org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task.

Which makes it seem like an ElasticSearch problem, so I check the 
ElasticSearch status on the Graylog web UI and it says Green. Just in case 
it's lying...

[root@graylog graylog-server]# curl -XGET 
'http://10.200.2.120:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "graylog-production",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 561,
  "active_shards" : 562,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}

So I stopped the graylog server, stopped all three ElasticSearch nodes, 
restarted all three ElasticSearch nodes and waited for them to re-form 
their constellation and go green again. Then I started the graylog server 
again. It promptly started processing data again then about 15 minutes 
later quit  processing again (started shoveling everything into mongo and 
taking nothing out) and the logs show the same problem with it not finding 
the latest deflector target.

Meanwhile, querying ElasticSearch via the curl API still works fine... so I 
seriously doubt it's ElasticSearch. There's a *reason* why I gave 
ElasticSearch three nodes with a redundancy level set to 2 -- I initially 
ran into problems with ElasticSearch not being reliable, so I gave it 
double redundancy across three nodes so that no matter what, the data 
should remain available. At that point ElasticSearch ceased to be a 
reliability issue.

Frankly, I am pretty discouraged right now. My Nagios logs show that the 
longest that graylog has remained up and running in the past month has been 
7 hours before it became unresponsive and had to be restarted. Then this 
random cessation of processing data where the server is still responsive 
but just won't save into ElasticSearch. Granted, I'm throwing a fair amount 
of data at Graylog, around 1200 messages per second, but Splunk wasn't even 
breathing hard at that load even though Splunk was running on 1/3rd the 
hardware. Unless I can figure out why graylog is ridiculously incapable of 
handling the load without falling over, I guess I'll have to accept it's 
worth what I paid for it (i.e., nothing) and find some other solution, 
sigh...



-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/429dd7e0-95f4-4112-b3aa-b9d03a6735d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to