Hi all

We have a setup of 2 graylog servers (1.1.6), both of which are running ES 
1.7.1 in redundant setup behind a load balancer.

When we do searches over a longer period of time (eg. 1 month search, which 
involves approximately 300 million messages) we several times managed to 
get an exception in the web interface, that in worst case caused either the 
graylog server process or elasticsearch to fail and required restarting 
those services.

Yesterday such exception happened to us on a search, for which Graylog 
couldn't write anymore to ES and started filling up its internal journal. 
After we restarted ES and ES recovered the indexes, the graylog journal got 
flushed to ES. Unfortunately when we now search and look in the histogram, 
we don't see any messages for the short period the outage happened.

We already tried recalculating the index ranges (completed successfully), 
but the messages still don't show up. As we could clearly see that messages 
got queued in GL's journal (> 100 K messages during the few minute window) 
and then flushed to ES, we believe that the messages actually got stored in 
ES, but somehow GL is unable to see them.

How can we investigate this, as it concerns us that messages could be lost, 
even though GL's journal was used during time of error.

Thanks

Best regards,
Marcel

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/5544752d-08d4-4505-8ff0-9eaa7fc73fd0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to