[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kezhu Wang resolved ZOOKEEPER-3315.
-----------------------------------
    Resolution: Information Provided

I second [~ctubbsii]'s point. Per-implementation of `MultiCallback` should be 
capable of doing what you are requesting and it has the full context.

> Exceptions in callbacks should be handlable by the application
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3315
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3315
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Steven McDonald
>            Priority: Major
>         Attachments: ExceptionTest.java
>
>
> Hi,
> In [KAFKA-7898|https://issues.apache.org/jira/browse/KAFKA-7898], a 
> {{NullPointerException}} in a {{MultiCallback}} caused a Kafka cluster to 
> become unhealthy in such a way that manual intervention was needed to 
> recover. The cause of this particular {{NullPointerException}} is fixed in 
> Kafka 2.2.x (with a proposed documentation update in 
> [ZOOKEEPER-3314|https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3314]),
>  but I am interested in improving the resiliency of Kafka (and by extension 
> the Zookeeper client library) against such bugs.
> Quoting the stack trace from KAFKA-7898:
> {code}
> [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable 
> (org.apache.zookeeper.ClientCnxn)
> java.lang.NullPointerException
> at 
> kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> {code}
> The "caught unexpected throwable" message comes from [within the Zookeeper 
> client 
> library|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/ClientCnxn.java#L641].
>  I think that try/catch is pointless, because removing it causes the message 
> to instead be logged 
> [here|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/server/ZooKeeperThread.java#L60],
>  with no discernable change in behaviour otherwise. Explicitly exiting the 
> {{EventThread}} when this happens does not help (I don't think it gets 
> restarted).
> This is especially problematic with distributed applications, since they are 
> generally designed to tolerate the loss of a node, so it is preferable to 
> have the application be allowed to terminate itself rather than risk 
> inconsistent state.
> I am attaching a simple Zookeeper client which does nothing except throw a 
> {{NullPointerException}} as soon as it receives a callback, to illustrate the 
> problem. Running this results in:
> {code}
> 232 [main-EventThread] ERROR org.apache.zookeeper.ClientCnxn  - Error while 
> calling watcher 
> java.lang.NullPointerException
>         at ExceptionTest.process(ExceptionTest.java:31)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:539)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:514)
> {code}
> This comes from 
> [here|https://github.com/apache/zookeeper/blob/7256d01a26412cd35a46edab6de9ac8c5adf5bb3/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L541],
>  which simply logs the occurrence but provides no way for my application to 
> handle the failure.
> I suspect the best approach here might be to allow the application to 
> register a callback to notify it of unhandlable exceptions within the 
> Zookeeper library, since Zookeeper has no way of knowing what approach makes 
> sense for the application. Of course, this is already technically possible in 
> this case by having the application catch all exceptions in every callback, 
> but that doesn't seem very practical.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to