jenkins-bot has submitted this change and it was merged.

Change subject: Minor improvements to script to restart elasticsearch cluster
......................................................................


Minor improvements to script to restart elasticsearch cluster

Ugly but working: will try disabling / enabling replication until
it succeed. This is needed on eqiad, which has occasional timeouts.

This script it meant to be run under human supervision anyway, so having
not ideal error management (e.g. no max retries) is not blocking.

Change-Id: I7d294389c82fd313ad5a6f2d9f01da0dd2ad0307
---
M maintenance/elasticsearch-scripts/restart-cluster.bash
1 file changed, 13 insertions(+), 7 deletions(-)

Approvals:
  EBernhardson: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/maintenance/elasticsearch-scripts/restart-cluster.bash 
b/maintenance/elasticsearch-scripts/restart-cluster.bash
index bca4f8b..9ce4b57 100755
--- a/maintenance/elasticsearch-scripts/restart-cluster.bash
+++ b/maintenance/elasticsearch-scripts/restart-cluster.bash
@@ -1,10 +1,10 @@
 #!/usr/bin/env bash
 set -e
 
-es_server_prefix=elastic20
-es_server_suffix=.codfw.wmnet
-first_server_index=1
-nb_of_servers_in_cluster=24
+es_server_prefix=elastic10
+es_server_suffix=.eqiad.wmnet
+first_server_index=7
+nb_of_servers_in_cluster=31
 
 for i in $(seq -w ${first_server_index} ${nb_of_servers_in_cluster}); do
     server="${es_server_prefix}${i}${es_server_suffix}"
@@ -17,7 +17,10 @@
     read
 
     echo "disabling replication"
-    ssh ${server} es-tool stop-replication
+    until ssh ${server} es-tool stop-replication
+    do
+        echo "failed to stop replication, trying again"
+    done
     echo "flushing markers"
     ssh ${server} curl -s -XPOST '127.0.0.1:9200/_flush/synced?pretty'
 
@@ -39,9 +42,12 @@
     echo "elasticsearch is started"
 
     echo "enabling replication"
-    ssh ${server} es-tool start-replication
+    until ssh ${server} es-tool start-replication
+    do
+        echo "failed to start replication, trying again"
+    done
 
-    echo "waiting for server recovery"
+    echo "waiting for cluster recovery"
     ssh ${server} "until curl -s 127.0.0.1:9200/_cat/health | grep green; do 
echo -n .; sleep 10; done"
 
     echo "${server} upgraded, please test"

-- 
To view, visit https://gerrit.wikimedia.org/r/278072
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7d294389c82fd313ad5a6f2d9f01da0dd2ad0307
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Gehel <[email protected]>
Gerrit-Reviewer: Cindy-the-browser-test-bot <[email protected]>
Gerrit-Reviewer: DCausse <[email protected]>
Gerrit-Reviewer: EBernhardson <[email protected]>
Gerrit-Reviewer: Manybubbles <[email protected]>
Gerrit-Reviewer: Smalyshev <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to