[jira] [Commented] (SOLR-8367) The new LIR 'all replicas participate' failsafe code needs to be improved.

Mike Drob (JIRA) Wed, 09 Dec 2015 10:34:21 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049146#comment-15049146
 ]


Mike Drob commented on SOLR-8367:
---------------------------------

bq. Since they are all fired off async, I don't know that it is really worth 
it. All the isClosed stuff is really just best effort to bail early, but not 
really critical it's at every point.
I see what you're saying about it being async, so it was still possible for a 
close to sneak in before this patch as well. If we're closed, but still request 
a replica to recover, then I see that it has it's own checks for shutting down 
and closed as well so things will be fine there.

Unrelated: while trying to trace the execution path here, I noticed that 
{{CoreAdminHandler::handleRequestRecoveryAction}} creates a thread and starts 
it without either giving it a name or submitting it to an executor. Should I 
file a separate JIRA for that? Looks like that thread was added by you in 
SOLR-4254, maybe that was before we had the executors everywhere.

> The new LIR 'all replicas participate' failsafe code needs to be improved.
> --------------------------------------------------------------------------
>
>                 Key: SOLR-8367
>                 URL: https://issues.apache.org/jira/browse/SOLR-8367
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>         Attachments: SOLR-8367.patch, SOLR-8367.patch, SOLR-8367.patch
>
>
> For one, it currently only kicks in the first attempted leader. If it's 
> another replica that is stuck in LIR, it won't help.
> Second, when we attempt to be leader, knowing we might fail due to LIR, we 
> should not put other replicas into recovery if they fail to sync with us - 
> not until we know we will actually be leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8367) The new LIR 'all replicas participate' failsafe code needs to be improved.

Reply via email to