Re: Trying to optimize configuration for better cluster restart/recovery

Tony Su Mon, 10 Feb 2014 13:29:45 -0800

I've verified that shards are re-allocating after a cluster restart (again, 
I'm using 1.0 RC1).
To test this specifically, I loaded a small dataset (can take a very long 
time to verify results on a large dataset).
 
Easy to verify.
In a 5 node cluster, load some apache data. (I loaded only a couple dozen 
days)
Let the cluster run until all shards are allocated, es-head can be good for 
this.
Flush and shutdown the cluster.
Bring up only one node and point es-head at it, should display all 5 shards 
for each index residing on the lone active node.
Bring up one additional, then maybe even a second node, refreshing es-head 
every 15 seconds or so. Shards are observed first replicating to the second 
node, then when the third node is active the shards are again re-allocated 
for balancing.
 
So, either the entry I made into elasticsearch.yml to disable shard 
allocation is incorrect, or there is likely a bug.(Or, I might 
fundamentally misunderstand what disabling shard re-allocation is supposed 
to do).
 
Maybe I'll re-test on a 0.90 cluster to see if it behaves differently...
 
Tony


On Friday, February 7, 2014 4:29:57 PM UTC-8, Tony Su wrote:

> Hi Ivan,
> Thx.
>  
> Yes, I have been doing a flush before every cluster shutdown now.
> Running ES 1.0 RC1
>  
> I have been doing rolling restarts because I have been unable to start all 
> nodes nearly at once and get all nodes to join even after extending the 
> timeout as I described. But, as I'm theorizing I'm speculating that doing a 
> rolling restart is contributing to the shards being re-allocated because 
> nodes that contain shards for the index may not appear soon enough.
>  
> Maybe the entry I made in elasticsearch.yml exactly as I described isn't 
> correct? I derived it from an ES source that described sending the command 
> using curl but I thought better to enter directly in elasticsearch.yml
>  
> I'll take a look at your link, thx.
>  
> Tony
>  
>  
>  
>  
>  
>  
>
> On Friday, February 7, 2014 3:23:24 PM UTC-8, Ivan Brusic wrote:
>
>> Shard allocation should never happen if disable_allocation is enabled. 
>> Which version are you using? Are you doing a rolling restart or a full 
>> cluster restart?
>>
>> Two things that might help. First is to execute a flush before 
>> restarting. I believe mismatched transaction states will label a shard as 
>> incorrect during a restart. Also play around with the recovery settings 
>> [1]. Try setting gateway.recover_after_nodes (disabled by default).
>>
>> [1] 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Fri, Feb 7, 2014 at 3:11 PM, Tony Su <[email protected]> wrote:
>>
>>> At first, I noticed what some have called "shard thrashing," ie during 
>>> startup shards are re-allocated as nodes come online.
>>>
>>> Have implemented the following by either creating a new setting or 
>>> modifying existing settings in elasticsearch.yml
>>>
>>> 1. Disable allocation altogether
>>>
>>> cluster.routing.allocation.disable_allocation: true
>>>
>>> 2. Avoid split-brain in the current 5 node cluster
>>>
>>> discovery.zen.minimum_master_nodes: 3
>>>
>>> 3 Increased Discovery timeout
>>>
>>> discovery.zen.ping.timeout: 100s
>>>
>>>
>>> Specific Objective:
>>> When a cluster restarts, try to force re-use of how the shards were 
>>> allocated before shutdown.
>>>
>>> Attempt:
>>> - Tried to increase the discovery.zen.minimum_master_nodes to 5 in a 5 
>>> node cluster with the idea that if a node could refuse to become 
>>> operational until all 5 nodes in the cluster were recognized. 
>>>
>>> Result:
>>> Unfortunately, despite making this setting equal to the total number of 
>>> nodes in the cluster, I observed shard re-allocation at 4 of the 5 nodes 
>>> without waiting for the fifth node to come online. And, this is with 
>>> allocation disabled.
>>>
>>> Would like an opinion whether what I'm trying to accomplish is even 
>>> possible to
>>> - As much as possible to force a restarted cluster to use existing 
>>> shards as already allocated
>>> - Start all at once rather than rolling node starts which contributes to 
>>> shard re-allocation.
>>>
>>> TIA,
>>> Tony
>>>
>>>  
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/aadbb803-f78e-4ddf-a718-69d4a2792f12%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60e41b4b-dc7e-4768-83aa-b095e50b2749%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Trying to optimize configuration for better cluster restart/recovery

Reply via email to