[ https://issues.apache.org/jira/browse/ZOOKEEPER-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kezhu Wang resolved ZOOKEEPER-3315. ----------------------------------- Resolution: Information Provided I second [~ctubbsii]'s point. Per-implementation of `MultiCallback` should be capable of doing what you are requesting and it has the full context. > Exceptions in callbacks should be handlable by the application > -------------------------------------------------------------- > > Key: ZOOKEEPER-3315 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3315 > Project: ZooKeeper > Issue Type: Improvement > Reporter: Steven McDonald > Priority: Major > Attachments: ExceptionTest.java > > > Hi, > In [KAFKA-7898|https://issues.apache.org/jira/browse/KAFKA-7898], a > {{NullPointerException}} in a {{MultiCallback}} caused a Kafka cluster to > become unhealthy in such a way that manual intervention was needed to > recover. The cause of this particular {{NullPointerException}} is fixed in > Kafka 2.2.x (with a proposed documentation update in > [ZOOKEEPER-3314|https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3314]), > but I am interested in improving the resiliency of Kafka (and by extension > the Zookeeper client library) against such bugs. > Quoting the stack trace from KAFKA-7898: > {code} > [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable > (org.apache.zookeeper.ClientCnxn) > java.lang.NullPointerException > at > kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > {code} > The "caught unexpected throwable" message comes from [within the Zookeeper > client > library|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/ClientCnxn.java#L641]. > I think that try/catch is pointless, because removing it causes the message > to instead be logged > [here|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/server/ZooKeeperThread.java#L60], > with no discernable change in behaviour otherwise. Explicitly exiting the > {{EventThread}} when this happens does not help (I don't think it gets > restarted). > This is especially problematic with distributed applications, since they are > generally designed to tolerate the loss of a node, so it is preferable to > have the application be allowed to terminate itself rather than risk > inconsistent state. > I am attaching a simple Zookeeper client which does nothing except throw a > {{NullPointerException}} as soon as it receives a callback, to illustrate the > problem. Running this results in: > {code} > 232 [main-EventThread] ERROR org.apache.zookeeper.ClientCnxn - Error while > calling watcher > java.lang.NullPointerException > at ExceptionTest.process(ExceptionTest.java:31) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:539) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:514) > {code} > This comes from > [here|https://github.com/apache/zookeeper/blob/7256d01a26412cd35a46edab6de9ac8c5adf5bb3/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L541], > which simply logs the occurrence but provides no way for my application to > handle the failure. > I suspect the best approach here might be to allow the application to > register a callback to notify it of unhandlable exceptions within the > Zookeeper library, since Zookeeper has no way of knowing what approach makes > sense for the application. Of course, this is already technically possible in > this case by having the application catch all exceptions in every callback, > but that doesn't seem very practical. -- This message was sent by Atlassian Jira (v8.20.10#820010)