[ 
https://issues.apache.org/jira/browse/LIVY-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Chen closed LIVY-852.
---------------------------
    Resolution: Cannot Reproduce

This may have been a false alarm on second thought--There was a local change 
that was incompatible with this one. Closing for now.

> Livy unable to recover upon losing connection with Zookeeper
> ------------------------------------------------------------
>
>                 Key: LIVY-852
>                 URL: https://issues.apache.org/jira/browse/LIVY-852
>             Project: Livy
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 0.6.0
>            Reporter: James Chen
>            Priority: Major
>
> We've noticed that LIVY-732 appears to change Livy's behavior upon loss of 
> connection with Zookeeper. Originally, before this pull request, upon loss of 
> connection with Zookeeper, Livy would exit with an exit code of 1, allowing 
> it to be restarted. At the moment, however, Livy continues to run, but 
> returns a 404 upon interaction with the REST API:
> <html>
>  <head>
>  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
>  <title>Error 404 </title>
>  </head>
>  <body>
>  <h2>HTTP ERROR: 404</h2>
>  <p>Problem accessing /sessions. Reason:
>  <pre> Not Found</pre></p>
>  <hr /><a href="http://eclipse.org/jetty";>Powered by Jetty:// 
> 9.3.24.v20180605</a><hr/>
>  </body>
>  </html>
> The direct cause of this change in behavior appears to be from the 
> UnhandledErrorListener being converted from a System.exit(1) to throwing a 
> LivyUncaughtException--see lines 74 from 
> server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperStateStore.scala
>  and lines 72 from 
> server/src/main/scala/org/apache/livy/server/recovery/ZooKeeperManager.scala, 
> at [https://github.com/apache/incubator-livy/pull/267/files.]
>  
> As a whole, this change appears to be undesirable, as Livy becomes completely 
> unresponsive after zookeeper reconnects (No logging/error messages are 
> printed out after the uncaught exception is thrown) and needs to be manually 
> checked and restarted. On the other hand, System.exit(1) seems to be a 
> roundabout way of fixing the issue, and specifying a ConnectionStateListener 
> instead of a UnhandledErrorListener might be better.
>  
> It would be good to figure out if this line should be reverted to a 
> System.exit(1), or if there is a better way of handling this issue.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to