Hangleton commented on code in PR #13558:
URL: https://github.com/apache/kafka/pull/13558#discussion_r1198005710


##########
core/src/main/scala/kafka/zk/KafkaZkClient.scala:
##########
@@ -2113,7 +2113,12 @@ class KafkaZkClient private[zk] (zooKeeperClient: 
ZooKeeperClient, isSecure: Boo
   }
 
   private class CheckedEphemeral(path: String, data: Array[Byte]) extends 
Logging {
+    private var attempt = 0
+    private val maxAttempt = 5
+    private val backoffMs = 1000

Review Comment:
   This is indeed an important element to consider when configuring the retry 
strategy. We probably don't want to retry past the session timeout because that 
exposes to the possibility that the session used during the first call is not 
valid anymore.
   
   And, the session timeout can be used as well to calibrate the response times 
to be expected, since the Zk client in Kafka does not make use of request 
timeouts. Another timeout to consider could be the read timeout which is the 
maximum time allowed without activity on a connection before that connection is 
closed. The Zookeeper protocol includes heartbeats (pings) performed at least 
once every half of the read timeout. The read timeout is set to be the session 
timeout multiplied by 2/3 divided by the number of hosts registered in the 
Zookeeper cluster.
   
   I haven't looked at Apache Curator to see what approach is used for retries 
there, but that could be an interesting entry point I suppose.
   
   Overall, the retry strategy here requires a bit more thoughts.
   
   Thanks for raising the point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to