[
https://issues.apache.org/jira/browse/SOLR-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049146#comment-15049146
]
Mike Drob commented on SOLR-8367:
---------------------------------
bq. Since they are all fired off async, I don't know that it is really worth
it. All the isClosed stuff is really just best effort to bail early, but not
really critical it's at every point.
I see what you're saying about it being async, so it was still possible for a
close to sneak in before this patch as well. If we're closed, but still request
a replica to recover, then I see that it has it's own checks for shutting down
and closed as well so things will be fine there.
Unrelated: while trying to trace the execution path here, I noticed that
{{CoreAdminHandler::handleRequestRecoveryAction}} creates a thread and starts
it without either giving it a name or submitting it to an executor. Should I
file a separate JIRA for that? Looks like that thread was added by you in
SOLR-4254, maybe that was before we had the executors everywhere.
> The new LIR 'all replicas participate' failsafe code needs to be improved.
> --------------------------------------------------------------------------
>
> Key: SOLR-8367
> URL: https://issues.apache.org/jira/browse/SOLR-8367
> Project: Solr
> Issue Type: Bug
> Reporter: Mark Miller
> Assignee: Mark Miller
> Attachments: SOLR-8367.patch, SOLR-8367.patch, SOLR-8367.patch
>
>
> For one, it currently only kicks in the first attempted leader. If it's
> another replica that is stuck in LIR, it won't help.
> Second, when we attempt to be leader, knowing we might fail due to LIR, we
> should not put other replicas into recovery if they fail to sync with us -
> not until we know we will actually be leader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]