I discovered the root cause. The current master was on a VM that was in a 
bad state. Sadly I could not get on to the host to debug the issue but it 
was still listening on 9200 and not accessible via ssh. I forced a master 
change by shutting down the node using the cluster admin api. Once the 
master had switched, index deletion worked.

Hopefully I will be able to get some logs off of that box so I can learn 
what state it was in that it staill thought it was the master yet couldn't 
perform master duties. This was a dedicated master node with master: true 
and data:false.  

On Wednesday, April 9, 2014 4:09:54 PM UTC-7, Dallas Mahrt wrote:
>
> My cluster has gotten into an odd state today. I have a regular job that 
> deletes indices after X days. The job executed an index deletion this 
> morning. When it did this the cluster went into a 'red state' claiming that 
> there were 10 unassigned shards (5 shards + 1 replica). After some 
> debugging I discovered the shards were associated with this index. I was 
> able to assign the shards manually to a node which fixed the cluster state. 
> I was then able to reproduce the issue by re-deleting the index. I captured 
> a lot of data on this attempt and I may be able to repro again and get 
> more. Any ideas on why this may have happened and how to prevent it? 
>
> DETAILS:
> ES version: 1.0.1 (recent upgrade from 0.90.11) Indexes were not carried 
> over. 
>
> NOTE: This is after it had failed qand was restored to a green state. The 
> deleted index had data before the initial deletion.
>
> BEFORE
> *All indices status*
> This reported the index (index-2014.04.03) existed with all shards in the 
> STARTED state
>
> curl 'http://es:9200/_status?pretty=true'
>   "_shards" : {
>     "total" : 70,
>     "successful" : 70,
>     "failed" : 0
>
> *Specific index status*
> This reported that the index had 10 successful shards
>
> curl 'http://es:9200/index-2014.04.03/_status?pretty=true'
>
>   "_shards" : {
>     "total" : 10,
>     "successful" : 10,
>     "failed" : 0
>   },
>
> *Index Settings*
> This showed that the index exists with the expected settings
>
> curl 'http://es:9200/_settings?pretty=true'
>
>
> "index-2014.04.03" : {
>     "settings" : {
>       "index" : {
>         "uuid" : "odEYl4lMQAiXFu4zQfBUeA",
>         "number_of_replicas" : "1",
>         "number_of_shards" : "5",
>         "refresh_interval" : "5s",
>         "version" : {
>           "created" : "1000199"
>         }
>       }
>     }
>   },
>
> *Cluster State*
> Cluster state showed that there were no unassigned nodes.
>
> curl 'http://es:9200/_cluster/state?pretty=true'
> ...
>  "routing_nodes" : {
>     "unassigned" : [ ],
> ..
>
> Then I perform a delete
> $ curl -X DELETE 'http://es:9200/index-2014.04.03'
> {"acknowledged":true}
>
> AFTER
> *All indices status*
> This no longer reports the deleted index (index-2014.04.03)
>
> curl 'http://es:9200/_status?pretty=true'
>   "_shards" : {
>     "total" : 70,
>     "successful" : 60,
>     "failed" : 0
>   },
>
> *Specific index status*
> This was the giveaway that we had an issue. It reports no data for the 
> index except that it had associated shards and were not successful.
>
> curl 'http://es:9200/index-2014.04.03/_status?pretty=true'
> {
>   "_shards" : {
>     "total" : 10,
>     "successful" : 0,
>     "failed" : 0
>   },
>   "indices" : { }
> }
>
> *Index Settings*
> This showed that the index exists with the expected settings
>
> curl 'http://es:9200/_settings?pretty=true'
> "index-2014.04.03" : {
>     "settings" : {
>       "index" : {
>         "uuid" : "odEYl4lMQAiXFu4zQfBUeA",
>         "number_of_replicas" : "1",
>         "number_of_shards" : "5",
>         "refresh_interval" : "5s",
>         "version" : {
>           "created" : "1000199"
>         }
>       }
>     }
>   },
>
> *Cluster State*
> Cluster state showed the index and that all of its shards were unassigned.
>
> curl 'http://es:9200/_cluster/state?pretty=true'
> ...
> "index-2014.04.03" : {
>         "shards" : {
>           "2" : [ {
>             "state" : "UNASSIGNED",
>             "primary" : true,
>             "node" : null,
>             "relocating_node" : null,
>             "shard" : 2,
>             "index" : "index-2014.04.03"
>           }, {
> ...
>    "routing_nodes" : {
>     "unassigned" : [ {
>       "state" : "UNASSIGNED",
>       "primary" : true,
>       "node" : null,
>       "relocating_node" : null,
>       "shard" : 2,
>       "index" : "index-2014.04.03"
>     }, {
> ...
>
>
> To 'correct' the situation I ran:
> curl -XPOST 'es:9200/_cluster/reroute' -d '{"commands": [    {"allocate": 
> {        "index": "index-2014.04.03",        "shard": 4,        "node": 
> "cvBlpg_jQTajnq-HdJCfCA",        "allow_primary": true }    }]}'
>
>
> Let me know if there is more I can assist with.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cddf2a1e-8c1f-484d-88f9-18bb022e7f0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to