[
https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690673#comment-17690673
]
Ishan Chattopadhyaya commented on SOLR-6405:
--------------------------------------------
Through my testing with solr-bench, I've seen many cases (say 1 in 25-30) where
nodes come up, recovery of replicas happen for a few replicas and then that
doesn't complete for all replicas (and the restarted node stays with some
replicas in DOWN state). I tracked them down to Solr not re-connecting to
ZooKeeper after a session loss.
I should add that this test is repeatable for me, but in order to reproduce
this, I have to wait several hours of running (or even days). This situation
was so annoying while developing the test suite (because of infinite hang/wait
for all replicas to come up) that I bailed out on those with a timeout and
failed the test and moved on. But definitely something on my radar to
revisit/address/fix. FYI [~noblepaul].
> ZooKeeper calls can easily not be retried enough on ConnectionLoss.
> -------------------------------------------------------------------
>
> Key: SOLR-6405
> URL: https://issues.apache.org/jira/browse/SOLR-6405
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Mark Miller
> Assignee: Mark Miller
> Priority: Critical
> Fix For: 4.10, 6.0
>
> Attachments: SOLR-6405.patch
>
>
> The current design requires that we are sure we retry on connection loss
> until session expiration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]