Hello! TL;DR: Our recent elasticsearch cluster restart did not go as planned. Most important lesson learned: we did not understand the recovery settings correctly.
Yesterday, we did a cold restart of the elasticsearch / cirrus eqiad cluster. This restart did not go as planned. It did not generate any user facing impact, since we moved all the traffic to codfw before the restart. It did impact logstash (more of that in a different report). Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20170920-Elasticsearch Have fun! Guillaume -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation UTC+2 / CEST _______________________________________________ Discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
