Re: Replica shard stuck at initializing after client and data node restart

Mark Walkom Fri, 21 Mar 2014 18:49:17 -0700

What version are you running?

It's odd this would happen if, when you set replica's to zero, the cluster
state is green and your index is ok.


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com


On 22 March 2014 06:15, Glenn Snead <[email protected]> wrote:

> I have a six node cluster: 2 master nodes and 4 client / data nodes.  I
> have two indicies.  One with data and one that is set aside for future
> use.  I'm having trouble with the indicie that is in use.
> After making some limits.conf configuraiton changes and restarting the
> impacted nodes, one of my indicies' replica shard will not complete
> initialization.
> I wasn't in charge of the node restarts and here is the sequence of events:
> Shut down the client and data nodes on each of the four servers.
> Start the client and data node on each server.
> I don't believe time was allowed to allow the cluster to reallocate or to
> move shards.
>
> limits.conf changes:
> - memlock unlimited
> hard nofiles 32000
> soft  nofiles  32000
>
> Here's what I have tried thus far:
>
> Drop the replica shard, which brings the cluster status to Green.
> Verify the cluster's status - no replication, no realocating, etc.
> Re-add the replica shard.
>
> Drop the replica shard and the data nodes that were to carry the replica
> shard.
> Verify the cluster's status.
> Start the data nodes and allow the cluster to reallocate primary shards.
>  - The cluster's status is Green.
> Add the replica shard to the indicie.  The replica shard never completes
> initialization, even over a 24 hour period.
>
> I've checked the transaction log files on each node and they are all zero
> legnth files.
> The replica shard holding nodes are primary shards for the unused indicie.
> These nodes copied it's matching primary node's index Size (as seen in
> paramedic), but now Paramedic shows an index Size of only a few bytes.  The
> index folder on the replica shard servers still has the data.
>
> Unknown to me, my target system was put online and my leadership doesn't
> want to schedule an outage window.  Most my reasearch suggests that I drop
> the impacted indicie and re-initialize.  I can replace the data, but this
> would impact the user interface while the indicie re-ingests the
> documents.  This issue has occured before on my test system and the fix was
> to rebuild the index.  However I never learned why the replica shard had
> the issue in the first place.
>
> My questions are:
> - Does the replica shard hosting server's index Size (shown in paramedic)
> indciate a course of action?
> - Is it possible to resolve this without dropping the indicie and
> rebuilding?  I'd hate to resort to this each time we attempt ES server
> maintenance or configuration changes.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZsMpqnF16T_3-ZzDy7SjcsFouaDOBQQEEATby%2B7Lorzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Replica shard stuck at initializing after client and data node restart

Reply via email to