On Wed, Jun 25, 2014 at 8:05 AM, James Carr <[email protected]> wrote:

> I launched two new EC2 instances to join the cluster and watched. Some
> shards began relocating, no big deal. Six hours later I checked in and
> some shards were still locating, one shard was recovering. Weird but
> whatever... the cluster health is still green and searches are working
> fine.


I add new nodes every once in a while and it can take a few hours for
everything to balance out, but six hours is a bit long.  Its possible.  Do
you have graphs of the count of relocating shards?  Something like this can
really help you figure out if everything balanced out at some point and
then unbalanced.  Example:
http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Elasticsearch%20cluster%20eqiad&h=elastic1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1403698335&v=0&m=es_relocating_shards&vl=shards&ti=es_relocating_shards&z=large

Then I got an alert at 2:30am that the cluster state is now
> yellow and find that we have 3 shards marked as recovering and 2
> shards that unassigned. The cluster still technically works but 24
> hours later after the new nodes were added I feel like my only choice
> to get a green cluster again will be to simply launch 5 fresh nodes
> and replay all the data from backups into it. Ugggggh.
>

This sounds like one of the nodes bounced.  It can take a long time to
recover from that.  Its something that is being worked on.  Check the logs
and see if you see anything about it.

One thing to make sure of is that you set the number of master nodes
correctly on all nodes.  If you have five master eligible nodes then set it
to 3.  If the two new nodes aren't master eligible (you have three master
eligible nodes) then set it to 2.


> SERIOUSLY! What can I do to prevent this? I feel like I am missing
> something because I always heard the strength of elasticsearch is its
> ease of scaling out but it feels like every time I try it falls to the
> floor. :-(
>

Its always been pretty painless for me.  I did have trouble when I added
nodes that were broken: one time I added nodes without SSDs to a cluster
with SSDs.  Another time I didn't set the heap size on the new nodes and
they worked until some shards moved to them.  Then they fell over.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0CNzBEv6HC8J-P91qHS46Micb7VjmO2LTXN4JY2QGCkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to