[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107289#comment-14107289 ]
Jessica Cheng Mallet commented on SOLR-6405: -------------------------------------------- Right, most likely the first time it hits the ConnectionLoss it's not time=0 of the connection loss, so by loop i=4, it would've slept for 15s since the i=0 and therefore hit a SessionExpired. But then, thinking about it again, why be clever at all about the padding or back-off? Not to propose that we change this now, but let's pretend we don't do back-off and just sleep 1s between each loop. If we were to get ConnectionLoss back in the next attempt, there's no harm to try at all because if we're disconnected, the attempt wouldn't be hitting zookeeper anyway. If we were to get SessionExpired back, great, we can break out now and throw the exception. If we've reconnected, then yay, we succeeded. Because with each call we're expecting to get either success, failure (SessionExpired), or "in progress" (ConnectionLoss), we can really just retry "forever" without limiting the loop count (unless we're worried that somehow we'll keep getting ConnectionLoss even though the session has expired, but that'd be a pretty serious zookeeper client bug. And if we're really worried about that, we can always say do 10 more loops after we have slept a total of timeout already). In the end, it's really weird that this method should ever semantically allow throwing a ConnectionLoss exception, if we got the math wrong, because the intent is to retry until we get a SessionExpired, isn't it? > ZooKeeper calls can easily not be retried enough on ConnectionLoss. > ------------------------------------------------------------------- > > Key: SOLR-6405 > URL: https://issues.apache.org/jira/browse/SOLR-6405 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Reporter: Mark Miller > Assignee: Mark Miller > Priority: Critical > Fix For: 5.0, 4.10 > > Attachments: SOLR-6405.patch > > > The current design requires that we are sure we retry on connection loss > until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org