Re: Replica shard stuck at initializing after client and data node restart

Glenn Snead Mon, 24 Mar 2014 08:20:49 -0700

Mark, thank you for responding.  It is quite odd, but what happened this 
morning is stranger.


I dropped and recreated the replica shard on the unused indicie.  Now the 
in-use indicie shows Green.  

FYI we're running ES version 0.90, and my in-use indicie is 717 gb with 
135M+ documents.  

On Friday I ran status reports on each indicie and compared both.  Nothing 
showed as "failed" or "red" or plain "wrong" so I left it over the 
weekend.  When I came in today the cluster was still Yellow.

Any idea if createating the other indicie's replica shard caused the 
cluster's status to go green?  It feels like a fluke, but I'm new to ES.

If this is indeed an expected ES behavior, I'll add this to my restoral 
procedures.

On Friday, March 21, 2014 9:48:27 PM UTC-4, Mark Walkom wrote:
>
> What version are you running?
>
> It's odd this would happen if, when you set replica's to zero, the cluster 
> state is green and your index is ok.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: [email protected] <javascript:>
> web: www.campaignmonitor.com
>
>
> On 22 March 2014 06:15, Glenn Snead <[email protected] <javascript:>>wrote:
>
>> I have a six node cluster: 2 master nodes and 4 client / data nodes.  I 
>> have two indicies.  One with data and one that is set aside for future 
>> use.  I'm having trouble with the indicie that is in use.
>> After making some limits.conf configuraiton changes and restarting the 
>> impacted nodes, one of my indicies' replica shard will not complete 
>> initialization.
>> I wasn't in charge of the node restarts and here is the sequence of 
>> events:
>> Shut down the client and data nodes on each of the four servers.
>> Start the client and data node on each server.
>> I don't believe time was allowed to allow the cluster to reallocate or to 
>> move shards.
>>
>> limits.conf changes: 
>> - memlock unlimited
>> hard nofiles 32000
>> soft  nofiles  32000
>>
>> Here's what I have tried thus far:
>>
>> Drop the replica shard, which brings the cluster status to Green.  
>> Verify the cluster's status - no replication, no realocating, etc.  
>> Re-add the replica shard.  
>>
>> Drop the replica shard and the data nodes that were to carry the replica 
>> shard.  
>> Verify the cluster's status.  
>> Start the data nodes and allow the cluster to reallocate primary shards.  
>>  - The cluster's status is Green.
>> Add the replica shard to the indicie.  The replica shard never completes 
>> initialization, even over a 24 hour period.
>>
>> I've checked the transaction log files on each node and they are all zero 
>> legnth files.
>> The replica shard holding nodes are primary shards for the unused indicie.
>> These nodes copied it's matching primary node's index Size (as seen in 
>> paramedic), but now Paramedic shows an index Size of only a few bytes.  The 
>> index folder on the replica shard servers still has the data.
>>
>> Unknown to me, my target system was put online and my leadership doesn't 
>> want to schedule an outage window.  Most my reasearch suggests that I drop 
>> the impacted indicie and re-initialize.  I can replace the data, but this 
>> would impact the user interface while the indicie re-ingests the 
>> documents.  This issue has occured before on my test system and the fix was 
>> to rebuild the index.  However I never learned why the replica shard had 
>> the issue in the first place.
>>
>> My questions are:
>> - Does the replica shard hosting server's index Size (shown in paramedic) 
>> indciate a course of action?
>> - Is it possible to resolve this without dropping the indicie and 
>> rebuilding?  I'd hate to resort to this each time we attempt ES server 
>> maintenance or configuration changes.
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed2501e5-b504-46e2-ae04-69097e6d46ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Replica shard stuck at initializing after client and data node restart

Reply via email to