Hi all We have a setup of 2 graylog servers (1.1.6), both of which are running ES 1.7.1 in redundant setup behind a load balancer.
When we do searches over a longer period of time (eg. 1 month search, which involves approximately 300 million messages) we several times managed to get an exception in the web interface, that in worst case caused either the graylog server process or elasticsearch to fail and required restarting those services. Yesterday such exception happened to us on a search, for which Graylog couldn't write anymore to ES and started filling up its internal journal. After we restarted ES and ES recovered the indexes, the graylog journal got flushed to ES. Unfortunately when we now search and look in the histogram, we don't see any messages for the short period the outage happened. We already tried recalculating the index ranges (completed successfully), but the messages still don't show up. As we could clearly see that messages got queued in GL's journal (> 100 K messages during the few minute window) and then flushed to ES, we believe that the messages actually got stored in ES, but somehow GL is unable to see them. How can we investigate this, as it concerns us that messages could be lost, even though GL's journal was used during time of error. Thanks Best regards, Marcel -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/5544752d-08d4-4505-8ff0-9eaa7fc73fd0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
