> On Mar 26, 2016, at 04:36, Jochen Schalanda <[email protected]> wrote: > > Hi Eric, > > which version of Elasticsearch and which version of Graylog are you using? > Are there any (detailed) error messages in either the logs of your > Elasticsearch nodes or your Graylog server nodes?
Centos 6 is the OS. Using the RPM's from Elastic for ElasticSearch. [root@graylog egreen]# rpm -qa | grep elastic elasticsearch-1.7.5-1.noarch No errors in the ElasticSearch logs on any of the three nodes, they all think they're happily chuckling along. Mongo is from mongodb.org <http://mongodb.org/>: mongodb-org-3.2.4-1.el6.x86_64 Graylog is from graylog.org <http://graylog.org/>: graylog-server-1.3.4-1.noarch Misc: [root@graylog egreen]# java -version java version "1.7.0_99" OpenJDK Runtime Environment (rhel-2.6.5.0.el6_7-x86_64 u99-b00) OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode) One thing I decided was a Clue(tm) was that I kept getting messages that garbage collections were taking too long. So I switched to the g1gc garbage collector. That reduced the number of garbage collection messages, but I still got one from time to time. I also decided perhaps resource usage of keeping 200 indices (around 6 weeks of data for my cloud) was too much, so chopped it down to 100 indices max. And finally I decided to up the amount of memory for the Java virtual machine because I noticed that after a while it seemed to accumulate a lot of cruft, so I upped it from 1.7gb of memory to 2.0 gb of memory. Graylog stayed up overnight, which is promising. I guess I just need to throw more resources at Graylog. I'll rearrange things to move ElasticSearch off the Graylog server (it's using 1.6gb of resident memory right now) and give that to Graylog instead, and see if Graylog can then handle the full 6 weeks of data that I was trying to retain. That virtual machine is a bit loaded anyhow, it gets syslog-ng data from both my production cloud and R&D servers as well as running Graylog's MongoDb and of course Graylog. > > Cheers, > Jochen > > On Friday, 25 March 2016 22:32:08 UTC+1, Eric Green wrote: > 2016-03-25T20:47:35.346Z WARN [IndexHelper] Couldn't find latest deflector > target index > org.graylog2.database.NotFoundException: Index range for index <graylog2_185> > not found. > > I cannot reach the 'indices' page for further information, so I attempted to > manually cycle via an API call. At which point I get: > > 2016-03-25T20:47:37.985Z ERROR [IndexRotationThread] Couldn't point deflector > to a new index > org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task. > > Which makes it seem like an ElasticSearch problem, so I check the > ElasticSearch status on the Graylog web UI and it says Green. Just in case > it's lying... > > [root@graylog graylog-server]# curl -XGET > 'http://10.200.2.120:9200/_cluster/health?pretty=true > <http://10.200.2.120:9200/_cluster/health?pretty=true>' > { > "cluster_name" : "graylog-production", > "status" : "green", > "timed_out" : false, > "number_of_nodes" : 4, > "number_of_data_nodes" : 3, > "active_primary_shards" : 561, > "active_shards" : 562, > "relocating_shards" : 0, > "initializing_shards" : 0, > "unassigned_shards" : 0, > "delayed_unassigned_shards" : 0, > "number_of_pending_tasks" : 0, > "number_of_in_flight_fetch" : 0 > } > > So I stopped the graylog server, stopped all three ElasticSearch nodes, > restarted all three ElasticSearch nodes and waited for them to re-form their > constellation and go green again. Then I started the graylog server again. It > promptly started processing data again then about 15 minutes later quit > processing again (started shoveling everything into mongo and taking nothing > out) and the logs show the same problem with it not finding the latest > deflector target. > > Meanwhile, querying ElasticSearch via the curl API still works fine... so I > seriously doubt it's ElasticSearch. There's a *reason* why I gave > ElasticSearch three nodes with a redundancy level set to 2 -- I initially ran > into problems with ElasticSearch not being reliable, so I gave it double > redundancy across three nodes so that no matter what, the data should remain > available. At that point ElasticSearch ceased to be a reliability issue. > > Frankly, I am pretty discouraged right now. My Nagios logs show that the > longest that graylog has remained up and running in the past month has been 7 > hours before it became unresponsive and had to be restarted. Then this random > cessation of processing data where the server is still responsive but just > won't save into ElasticSearch. Granted, I'm throwing a fair amount of data at > Graylog, around 1200 messages per second, but Splunk wasn't even breathing > hard at that load even though Splunk was running on 1/3rd the hardware. > Unless I can figure out why graylog is ridiculously incapable of handling the > load without falling over, I guess I'll have to accept it's worth what I paid > for it (i.e., nothing) and find some other solution, sigh... > > > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "Graylog Users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/graylog2/LL3N86mYjBM/unsubscribe > <https://groups.google.com/d/topic/graylog2/LL3N86mYjBM/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/graylog2/d0506540-2b28-4df0-bb8b-026211528a23%40googlegroups.com > > <https://groups.google.com/d/msgid/graylog2/d0506540-2b28-4df0-bb8b-026211528a23%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/95023CAC-7EBF-4CD1-AB73-4E4381C2C276%40gmail.com. For more options, visit https://groups.google.com/d/optout.
