And on elasticsearch01 (the node referred to in the error message) I'm seeing a whole lot of these:
[2014-02-13 19:13:42,391][WARN ][transport ] [elasticsearch01] Received response for a request that has timed out, sent [1138869ms] ago, timed out [238869ms] ago, action [index/shard/recovery/prepareTranslog], node [[elasticsearch03][LgR5cuiCQmSfOTfTl6t1qA][inet[/10.84.100.219:9300]]], id [702105] [2014-02-13 19:13:43,573][WARN ][cluster.action.shard ] [elasticsearch01] [vgd][3] received shard failed for [vgd][3], node[LgR5cuiCQmSfOTfTl6t1qA], relocating [VuACiBeiToyz7xEZ5RJsxQ], [P], s[INITIALIZING], indexUUID [-5I0LkSET8GXIaOLCpnQUQ], reason [Failed to start shard, message [RecoveryFailedException[[vgd][3]: Recovery failed from [elasticsearch01][VuACiBeiToyz7xEZ5RJsxQ][inet[/10.84.200.129:9300]] into [elasticsearch03][LgR5cuiCQmSfOTfTl6t1qA][inet[/10.84.100.219:9300]]]; nested: RemoteTransportException[[elasticsearch01][inet[/10.84.200.129:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[vgd][3] Phase[2] Execution failed]; nested: ReceiveTimeoutTransportException[[elasticsearch03][inet[/10.84.100.219:9300]][index/shard/recovery/prepareTranslog] request_id [712931] timed out after [900000ms]]; ]] On Thursday, February 13, 2014 7:25:22 PM UTC+1, Christer wrote: > > Earlier today I added a third node to our cluster. It shares the same > version of elasticsearch (0.90.10) and jvm (1.7.0_13) as the two existing > nodes. > > Now, some hours after I added the node, two shards are still "relocating". > The status of the cluster is green though. I'm getting some errors in the > log of the node I added: > > [2014-02-13 19:13:43,572][WARN ][cluster.action.shard ] > [elasticsearch03] [vgd][3] sending failed shard for [vgd][3], > node[LgR5cuiCQmSfOTfTl6t1qA], relocating [VuACiBeiToyz7xEZ5RJsxQ], [P], > s[INITIALIZING], indexUUID [-5I0LkSET8GXIaOLCpnQUQ], reason [Failed to > start shard, message [RecoveryFailedException[[vgd][3]: Recovery failed > from [elasticsearch01][VuACiBeiToyz7xEZ5RJsxQ][inet[/10.84.200.129:9300]] > into [elasticsearch03][LgR5cuiCQmSfOTfTl6t1qA][inet[/10.84.100.219:9300]]]; > nested: > RemoteTransportException[[elasticsearch01][inet[/10.84.200.129:9300]][index/shard/recovery/startRecovery]]; > > nested: RecoveryEngineException[[vgd][3] Phase[2] Execution failed]; > nested: > ReceiveTimeoutTransportException[[elasticsearch03][inet[/10.84.100.219:9300]][index/shard/recovery/prepareTranslog] > > request_id [712931] timed out after [900000ms]]; ]] > > It says it "timed out", but there is no connection issues between the > nodes as far as I can tell. The new node has ~2M docs, whereas node1 and 2 > has ~45M (which is the total amount of indexed docs). The new node also > uses quite a lot CPU, as it has been doing since it joined the cluster > earlier today. > > Any tips on how to debug this problem any further so I can have a three > node cluster up and running? > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a946189-29cf-4e3b-b066-fa28decb36d8%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
