I discovered the root cause. The current master was on a VM that was in a bad state. Sadly I could not get on to the host to debug the issue but it was still listening on 9200 and not accessible via ssh. I forced a master change by shutting down the node using the cluster admin api. Once the master had switched, index deletion worked.
Hopefully I will be able to get some logs off of that box so I can learn what state it was in that it staill thought it was the master yet couldn't perform master duties. This was a dedicated master node with master: true and data:false. On Wednesday, April 9, 2014 4:09:54 PM UTC-7, Dallas Mahrt wrote: > > My cluster has gotten into an odd state today. I have a regular job that > deletes indices after X days. The job executed an index deletion this > morning. When it did this the cluster went into a 'red state' claiming that > there were 10 unassigned shards (5 shards + 1 replica). After some > debugging I discovered the shards were associated with this index. I was > able to assign the shards manually to a node which fixed the cluster state. > I was then able to reproduce the issue by re-deleting the index. I captured > a lot of data on this attempt and I may be able to repro again and get > more. Any ideas on why this may have happened and how to prevent it? > > DETAILS: > ES version: 1.0.1 (recent upgrade from 0.90.11) Indexes were not carried > over. > > NOTE: This is after it had failed qand was restored to a green state. The > deleted index had data before the initial deletion. > > BEFORE > *All indices status* > This reported the index (index-2014.04.03) existed with all shards in the > STARTED state > > curl 'http://es:9200/_status?pretty=true' > "_shards" : { > "total" : 70, > "successful" : 70, > "failed" : 0 > > *Specific index status* > This reported that the index had 10 successful shards > > curl 'http://es:9200/index-2014.04.03/_status?pretty=true' > > "_shards" : { > "total" : 10, > "successful" : 10, > "failed" : 0 > }, > > *Index Settings* > This showed that the index exists with the expected settings > > curl 'http://es:9200/_settings?pretty=true' > > > "index-2014.04.03" : { > "settings" : { > "index" : { > "uuid" : "odEYl4lMQAiXFu4zQfBUeA", > "number_of_replicas" : "1", > "number_of_shards" : "5", > "refresh_interval" : "5s", > "version" : { > "created" : "1000199" > } > } > } > }, > > *Cluster State* > Cluster state showed that there were no unassigned nodes. > > curl 'http://es:9200/_cluster/state?pretty=true' > ... > "routing_nodes" : { > "unassigned" : [ ], > .. > > Then I perform a delete > $ curl -X DELETE 'http://es:9200/index-2014.04.03' > {"acknowledged":true} > > AFTER > *All indices status* > This no longer reports the deleted index (index-2014.04.03) > > curl 'http://es:9200/_status?pretty=true' > "_shards" : { > "total" : 70, > "successful" : 60, > "failed" : 0 > }, > > *Specific index status* > This was the giveaway that we had an issue. It reports no data for the > index except that it had associated shards and were not successful. > > curl 'http://es:9200/index-2014.04.03/_status?pretty=true' > { > "_shards" : { > "total" : 10, > "successful" : 0, > "failed" : 0 > }, > "indices" : { } > } > > *Index Settings* > This showed that the index exists with the expected settings > > curl 'http://es:9200/_settings?pretty=true' > "index-2014.04.03" : { > "settings" : { > "index" : { > "uuid" : "odEYl4lMQAiXFu4zQfBUeA", > "number_of_replicas" : "1", > "number_of_shards" : "5", > "refresh_interval" : "5s", > "version" : { > "created" : "1000199" > } > } > } > }, > > *Cluster State* > Cluster state showed the index and that all of its shards were unassigned. > > curl 'http://es:9200/_cluster/state?pretty=true' > ... > "index-2014.04.03" : { > "shards" : { > "2" : [ { > "state" : "UNASSIGNED", > "primary" : true, > "node" : null, > "relocating_node" : null, > "shard" : 2, > "index" : "index-2014.04.03" > }, { > ... > "routing_nodes" : { > "unassigned" : [ { > "state" : "UNASSIGNED", > "primary" : true, > "node" : null, > "relocating_node" : null, > "shard" : 2, > "index" : "index-2014.04.03" > }, { > ... > > > To 'correct' the situation I ran: > curl -XPOST 'es:9200/_cluster/reroute' -d '{"commands": [ {"allocate": > { "index": "index-2014.04.03", "shard": 4, "node": > "cvBlpg_jQTajnq-HdJCfCA", "allow_primary": true } }]}' > > > Let me know if there is more I can assist with. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cddf2a1e-8c1f-484d-88f9-18bb022e7f0a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
