James Chen created LIVY-852:
-------------------------------
Summary: Livy unable to recover upon losing connection with
Zookeeper
Key: LIVY-852
URL: https://issues.apache.org/jira/browse/LIVY-852
Project: Livy
Issue Type: Bug
Components: Server
Affects Versions: 0.6.0
Reporter: James Chen
We've noticed that LIVY-732 appears to change Livy's behavior upon loss of
connection with Zookeeper. Originally, before this pull request, upon loss of
connection with Zookeeper, Livy would exit with an exit code of 1, allowing it
to be restarted. At the moment, however, Livy continues to run, but returns a
404 upon interaction with the REST API:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 </title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing /sessions. Reason:
<pre> Not Found</pre></p>
<hr /><a href="http://eclipse.org/jetty">Powered by Jetty://
9.3.24.v20180605</a><hr/>
</body>
</html>
The direct cause of this change in behavior appears to be from the
UnhandledErrorListener being converted from a System.exit(1) to throwing a
LivyUncaughtException--see lines 74 from
server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperStateStore.scala
and lines 72 from
server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperManager.scala,
at [https://github.com/apache/incubator-livy/pull/267/files.]
As a whole, this change appears to be undesirable, as Livy becomes completely
unresponsive after zookeeper reconnects--no logging/error messages are printed
out after the uncaught exception is thrown--and needs to be manually checked
and restarted. On the other hand, System.exit(1) seems to be a roundabout way
of fixing the issue, and specifying a ConnectionStateListener instead of a
UnhandledErrorListener might be better.
It would be good to figure out if this line should be reverted to a
System.exit(1), or if there is a better way of handling this issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)