[
https://issues.apache.org/jira/browse/LIVY-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Chen closed LIVY-852.
---------------------------
Resolution: Cannot Reproduce
This may have been a false alarm on second thought--There was a local change
that was incompatible with this one. Closing for now.
> Livy unable to recover upon losing connection with Zookeeper
> ------------------------------------------------------------
>
> Key: LIVY-852
> URL: https://issues.apache.org/jira/browse/LIVY-852
> Project: Livy
> Issue Type: Bug
> Components: Server
> Affects Versions: 0.6.0
> Reporter: James Chen
> Priority: Major
>
> We've noticed that LIVY-732 appears to change Livy's behavior upon loss of
> connection with Zookeeper. Originally, before this pull request, upon loss of
> connection with Zookeeper, Livy would exit with an exit code of 1, allowing
> it to be restarted. At the moment, however, Livy continues to run, but
> returns a 404 upon interaction with the REST API:
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
> <title>Error 404 </title>
> </head>
> <body>
> <h2>HTTP ERROR: 404</h2>
> <p>Problem accessing /sessions. Reason:
> <pre> Not Found</pre></p>
> <hr /><a href="http://eclipse.org/jetty">Powered by Jetty://
> 9.3.24.v20180605</a><hr/>
> </body>
> </html>
> The direct cause of this change in behavior appears to be from the
> UnhandledErrorListener being converted from a System.exit(1) to throwing a
> LivyUncaughtException--see lines 74 from
> server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperStateStore.scala
> and lines 72 from
> server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperManager.scala,
> at [https://github.com/apache/incubator-livy/pull/267/files.]
>
> As a whole, this change appears to be undesirable, as Livy becomes completely
> unresponsive after zookeeper reconnects (No logging/error messages are
> printed out after the uncaught exception is thrown) and needs to be manually
> checked and restarted. On the other hand, System.exit(1) seems to be a
> roundabout way of fixing the issue, and specifying a ConnectionStateListener
> instead of a UnhandledErrorListener might be better.
>
> It would be good to figure out if this line should be reverted to a
> System.exit(1), or if there is a better way of handling this issue.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)