rdhabalia commented on issue #569: Revert back to default 
ZookeeperClientFactoryImpl
URL: https://github.com/apache/incubator-pulsar/pull/569#issuecomment-315687079
 
 
   After enabling debug log, found out that build exists because 
`ZooKeeperSessionWatcher` couldn't get heartbeat with in zksession timeout.
   ```
   [pulsar-zk-session-watcher-274-1:ZooKeeperSessionWatcher@164] - zoo keeper 
disconnected, waiting to reconnect, time remaining 0
   [pulsar-zk-session-watcher-75235-1:ZooKeeperSessionWatcher@158] - timeout 
expired for reconnecting, invoking shutdown service
   ```
   
   After digging into it, it seems issue is not BK-ZkClient library but the 
processing time of zk-response into aspectj-advice. 
[ZKClientCnxAspect](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/zookeeper/aspectj/ClientCnxnAspect.java#L72)
 intercept zk-response call and if takes more than few msec then zk-client 
somewhere lose the event (not sure what exactly happens in zk-client) and it 
doesn't serve any subsequent zk-response which ultimately cause zk-timeout.
   
   It can be easily verified by 
   **Fix will not fail if:** commenting out [event-notification at 
timedProcessEvent](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/zookeeper/aspectj/ClientCnxnAspect.java#L81)
   ```java
   if (request != null) {
         long timeElapsed = (MathUtils.now() - startTimeMs);
          //notifyListeners(checkType(request), timeElapsed);
   }
   ```
   
   **build immediately fails**
   Replace`notifyListeners(checkType(request), timeElapsed);` with 
`Thread.sleep(50)`
   ```java
   if (request != null) {
         long timeElapsed = (MathUtils.now() - startTimeMs);
          Thread.sleep(100); // if it takes more than few msec then zk-client 
lib misbehaves
   }
   ```
   
   I am testing the 
[fix](https://github.com/rdhabalia/pulsar/commit/af6734d2da66a0605f9cb0a96f116345502de74b),
 and will create a PR after testing it multiple times.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to