Hello!

TL;DR: Our recent elasticsearch cluster restart did not go as planned.
Most important lesson learned: we did not understand the recovery
settings correctly.

Yesterday, we did a cold restart of the elasticsearch / cirrus eqiad
cluster. This restart did not go as planned. It did not generate any
user facing impact, since we moved all the traffic to codfw before the
restart. It did impact logstash (more of that in a different report).

Incident documentation:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20170920-Elasticsearch

Have fun!

   Guillaume

-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST

_______________________________________________
Discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to