James Chen created LIVY-852:
-------------------------------

             Summary: Livy unable to recover upon losing connection with 
Zookeeper
                 Key: LIVY-852
                 URL: https://issues.apache.org/jira/browse/LIVY-852
             Project: Livy
          Issue Type: Bug
          Components: Server
    Affects Versions: 0.6.0
            Reporter: James Chen


We've noticed that LIVY-732 appears to change Livy's behavior upon loss of 
connection with Zookeeper. Originally, before this pull request, upon loss of 
connection with Zookeeper, Livy would exit with an exit code of 1, allowing it 
to be restarted. At the moment, however, Livy continues to run, but returns a 
404 upon interaction with the REST API:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 </title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing /sessions. Reason:
<pre> Not Found</pre></p>
<hr /><a href="http://eclipse.org/jetty";>Powered by Jetty:// 
9.3.24.v20180605</a><hr/>
</body>
</html>



The direct cause of this change in behavior appears to be from the 
UnhandledErrorListener being converted from a System.exit(1) to throwing a 
LivyUncaughtException--see lines 74 from 
server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperStateStore.scala 
and lines 72 from 
server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperManager.scala, 
at [https://github.com/apache/incubator-livy/pull/267/files.]

 

As a whole, this change appears to be undesirable, as Livy becomes completely 
unresponsive after zookeeper reconnects--no logging/error messages are printed 
out after the uncaught exception is thrown--and needs to be manually checked 
and restarted. On the other hand, System.exit(1) seems to be a roundabout way 
of fixing the issue, and specifying a ConnectionStateListener instead of a 
UnhandledErrorListener might be better.

 

It would be good to figure out if this line should be reverted to a 
System.exit(1), or if there is a better way of handling this issue.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to