Hi everyone, I'm trying to evaluate a patch that Jeremy Stribling has submitted, and I'd like some feedback from the user base on it. https://issues.apache.org/jira/browse/ZOOKEEPER-1442
The current behavior of ZK when we get an uncaught exception is to log it and try to move on. This is arguably not the right thing to do, and will possibly cause ZK to limp along with a bad VM (say, in an OOM state) for longer than it should. The patch proposes that when we get an instance of java.lang.Error, we should do a system.exit to fast-fail the process. With the possible exception of ThreadDeath (which may or may not be an unrecoverable system state depending on the thread), I think this makes sense, but I would like to hear from others if they have an opinion. I think it's better to kill the process and let your monitoring services detect process death (and thus restart) than possibly linger unresponsive for a while, are there scenarios that we're missing where this error can occur and you wouldn't want the process killed? Thanks for your feedback, Camille
