Re: [graylog2] Latest problem: Can't recycle or use indices

Eric Green Sat, 26 Mar 2016 12:58:14 -0700

> On Mar 26, 2016, at 04:36, Jochen Schalanda <[email protected]> wrote:
> 
> Hi Eric,
> 
> which version of Elasticsearch and which version of Graylog are you using? 
> Are there any (detailed) error messages in either the logs of your 
> Elasticsearch nodes or your Graylog server nodes?


Centos 6 is the OS. Using the RPM's from Elastic for ElasticSearch.

[root@graylog egreen]# rpm -qa | grep elastic
elasticsearch-1.7.5-1.noarch

No errors in the ElasticSearch logs on any of the three nodes, they all think 
they're happily chuckling along.

Mongo is from mongodb.org <http://mongodb.org/>:

mongodb-org-3.2.4-1.el6.x86_64

Graylog is from graylog.org <http://graylog.org/>:

graylog-server-1.3.4-1.noarch

Misc:

[root@graylog egreen]# java -version
java version "1.7.0_99"
OpenJDK Runtime Environment (rhel-2.6.5.0.el6_7-x86_64 u99-b00)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

One thing I decided was a Clue(tm) was that I kept getting messages that 
garbage collections were taking too long. So I switched to the g1gc garbage 
collector. That reduced the number of garbage collection messages, but I still 
got one from time to time. I also decided perhaps resource usage of keeping 200 
indices (around 6 weeks of data for my cloud) was too much, so chopped it down 
to 100 indices max. And finally I decided to up the amount of memory for the 
Java virtual machine because I noticed that after a while it seemed to 
accumulate a lot of cruft, so I upped it from 1.7gb of memory to 2.0 gb of 
memory. 

Graylog stayed up overnight, which is promising. I guess I just need to throw 
more resources at Graylog. I'll rearrange things to move ElasticSearch off the 
Graylog server (it's using 1.6gb of resident memory right now) and give that to 
Graylog instead, and see if Graylog can then handle the full 6 weeks of data 
that I was trying to retain. That virtual machine is a bit loaded anyhow, it 
gets syslog-ng data from both my production cloud and R&D servers as well as 
running Graylog's MongoDb and of course Graylog.


> 
> Cheers,
> Jochen
> 
> On Friday, 25 March 2016 22:32:08 UTC+1, Eric Green wrote:
> 2016-03-25T20:47:35.346Z WARN  [IndexHelper] Couldn't find latest deflector 
> target index
> org.graylog2.database.NotFoundException: Index range for index <graylog2_185> 
> not found.
> 
> I cannot reach the 'indices' page for further information, so I attempted to 
> manually cycle via an API call. At which point I get:
> 
> 2016-03-25T20:47:37.985Z ERROR [IndexRotationThread] Couldn't point deflector 
> to a new index
> org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task.
> 
> Which makes it seem like an ElasticSearch problem, so I check the 
> ElasticSearch status on the Graylog web UI and it says Green. Just in case 
> it's lying...
> 
> [root@graylog graylog-server]# curl -XGET 
> 'http://10.200.2.120:9200/_cluster/health?pretty=true 
> <http://10.200.2.120:9200/_cluster/health?pretty=true>'
> {
>   "cluster_name" : "graylog-production",
>   "status" : "green",
>   "timed_out" : false,
>   "number_of_nodes" : 4,
>   "number_of_data_nodes" : 3,
>   "active_primary_shards" : 561,
>   "active_shards" : 562,
>   "relocating_shards" : 0,
>   "initializing_shards" : 0,
>   "unassigned_shards" : 0,
>   "delayed_unassigned_shards" : 0,
>   "number_of_pending_tasks" : 0,
>   "number_of_in_flight_fetch" : 0
> }
> 
> So I stopped the graylog server, stopped all three ElasticSearch nodes, 
> restarted all three ElasticSearch nodes and waited for them to re-form their 
> constellation and go green again. Then I started the graylog server again. It 
> promptly started processing data again then about 15 minutes later quit  
> processing again (started shoveling everything into mongo and taking nothing 
> out) and the logs show the same problem with it not finding the latest 
> deflector target.
> 
> Meanwhile, querying ElasticSearch via the curl API still works fine... so I 
> seriously doubt it's ElasticSearch. There's a *reason* why I gave 
> ElasticSearch three nodes with a redundancy level set to 2 -- I initially ran 
> into problems with ElasticSearch not being reliable, so I gave it double 
> redundancy across three nodes so that no matter what, the data should remain 
> available. At that point ElasticSearch ceased to be a reliability issue.
> 
> Frankly, I am pretty discouraged right now. My Nagios logs show that the 
> longest that graylog has remained up and running in the past month has been 7 
> hours before it became unresponsive and had to be restarted. Then this random 
> cessation of processing data where the server is still responsive but just 
> won't save into ElasticSearch. Granted, I'm throwing a fair amount of data at 
> Graylog, around 1200 messages per second, but Splunk wasn't even breathing 
> hard at that load even though Splunk was running on 1/3rd the hardware. 
> Unless I can figure out why graylog is ridiculously incapable of handling the 
> load without falling over, I guess I'll have to accept it's worth what I paid 
> for it (i.e., nothing) and find some other solution, sigh...
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Graylog Users" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/graylog2/LL3N86mYjBM/unsubscribe 
> <https://groups.google.com/d/topic/graylog2/LL3N86mYjBM/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/graylog2/d0506540-2b28-4df0-bb8b-026211528a23%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/graylog2/d0506540-2b28-4df0-bb8b-026211528a23%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/95023CAC-7EBF-4CD1-AB73-4E4381C2C276%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Latest problem: Can't recycle or use indices

Reply via email to