[graylog2] Latest problem: Can't recycle or use indices

Eric Green Fri, 25 Mar 2016 14:33:07 -0700

2016-03-25T20:47:35.346Z WARN  [IndexHelper] Couldn't find latest deflector 
target index
org.graylog2.database.NotFoundException: Index range for index 
<graylog2_185> not found.

I cannot reach the 'indices' page for further information, so I attempted
to manually cycle via an API call. At which point I get:

2016-03-25T20:47:37.985Z ERROR [IndexRotationThread] Couldn't point
deflector to a new index
org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task.

Which makes it seem like an ElasticSearch problem, so I check the
ElasticSearch status on the Graylog web UI and it says Green. Just in case
it's lying...

[root@graylog graylog-server]# curl -XGET
'http://10.200.2.120:9200/_cluster/health?pretty=true'
{
"cluster_name" : "graylog-production",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 3,
"active_primary_shards" : 561,
"active_shards" : 562,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}

So I stopped the graylog server, stopped all three ElasticSearch nodes,
restarted all three ElasticSearch nodes and waited for them to re-form
their constellation and go green again. Then I started the graylog server
again. It promptly started processing data again then about 15 minutes
later quit processing again (started shoveling everything into mongo and
taking nothing out) and the logs show the same problem with it not finding
the latest deflector target.

Meanwhile, querying ElasticSearch via the curl API still works fine... so I
seriously doubt it's ElasticSearch. There's a *reason* why I gave
ElasticSearch three nodes with a redundancy level set to 2 -- I initially
ran into problems with ElasticSearch not being reliable, so I gave it
double redundancy across three nodes so that no matter what, the data
should remain available. At that point ElasticSearch ceased to be a
reliability issue.

Frankly, I am pretty discouraged right now. My Nagios logs show that the
longest that graylog has remained up and running in the past month has been
7 hours before it became unresponsive and had to be restarted. Then this
random cessation of processing data where the server is still responsive
but just won't save into ElasticSearch. Granted, I'm throwing a fair amount
of data at Graylog, around 1200 messages per second, but Splunk wasn't even
breathing hard at that load even though Splunk was running on 1/3rd the
hardware. Unless I can figure out why graylog is ridiculously incapable of
handling the load without falling over, I guess I'll have to accept it's
worth what I paid for it (i.e., nothing) and find some other solution,
sigh...

--
You received this message because you are subscribed to the Google Groups
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/graylog2/429dd7e0-95f4-4112-b3aa-b9d03a6735d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[graylog2] Latest problem: Can't recycle or use indices

Reply via email to