[ 
https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107289#comment-14107289
 ] 

Jessica Cheng Mallet commented on SOLR-6405:
--------------------------------------------

Right, most likely the first time it hits the ConnectionLoss it's not time=0 of 
the connection loss, so by loop i=4, it would've slept for 15s since the i=0 
and therefore hit a SessionExpired.

But then, thinking about it again, why be clever at all about the padding or 
back-off?

Not to propose that we change this now, but let's pretend we don't do back-off 
and just sleep 1s between each loop. If we were to get ConnectionLoss back in 
the next attempt, there's no harm to try at all because if we're disconnected, 
the attempt wouldn't be hitting zookeeper anyway. If we were to get 
SessionExpired back, great, we can break out now and throw the exception. If 
we've reconnected, then yay, we succeeded. Because with each call we're 
expecting to get either success, failure (SessionExpired), or "in progress" 
(ConnectionLoss), we can really just retry "forever" without limiting the loop 
count (unless we're worried that somehow we'll keep getting ConnectionLoss even 
though the session has expired, but that'd be a pretty serious zookeeper client 
bug. And if we're really worried about that, we can always say do 10 more loops 
after we have slept a total of timeout already).

In the end, it's really weird that this method should ever semantically allow 
throwing a ConnectionLoss exception, if we got the math wrong, because the 
intent is to retry until we get a SessionExpired, isn't it?

> ZooKeeper calls can easily not be retried enough on ConnectionLoss.
> -------------------------------------------------------------------
>
>                 Key: SOLR-6405
>                 URL: https://issues.apache.org/jira/browse/SOLR-6405
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.10
>
>         Attachments: SOLR-6405.patch
>
>
> The current design requires that we are sure we retry on connection loss 
> until session expiration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to