[
https://issues.apache.org/jira/browse/LIVY-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219641#comment-17219641
]
Deepak commented on LIVY-324:
-----------------------------
I have seen this issue in our Spark Yarn application.
The issue our app was that we were using a QpidJMS with daemon-thread
false(default).
[https://github.com/apache/qpid-jms/blob/0.45.0/qpid-jms-client/src/main/java/org/apache/qpid/jms/JmsConnectionFactory.java#L99]
Once setting these thread as daemon solved our problem.
Also noted that Livy-Server does not kill the Sessions forcefully, even though
the Yarn app is killed. Maybe this can be improved.
> RSCClient can loose its session/interpreter reference leaking job sessions
> --------------------------------------------------------------------------
>
> Key: LIVY-324
> URL: https://issues.apache.org/jira/browse/LIVY-324
> Project: Livy
> Issue Type: Bug
> Components: REPL, RSC
> Affects Versions: 0.3
> Reporter: Pat White
> Priority: Major
>
> Seeing an issue where Livy seems unable to kill a PySpark session due to
> disconnects with its session's ProcessInterpreter.
> User observation, using Hue Notebooks to launch a PySpark session, session is
> initiated and goes to running. User's job has errors and goes to idle but
> session remains running for 24+ hours. Usually we see an idle session
> automatically killed after 1 hour.
> In Yarn task log, we see the AM start ok and SparkContext comes up, user's
> job runs with errors and SparkContext goes to idle, the Yarn job then stays
> idle for 1 hour at which point PythonInterpreter calls shutdown;
> INFO PythonInterpreter: Shutting down process
> Nothing more is seen in the Yarn log, Yarn job remains running.
> In Livy log we see the following timeout exception when trying the shutdown:
> INFO com.cloudera.livy.Logging$class.info(40): Stopping InteractiveSession
> 0...
> WARN com.cloudera.livy.rsc.RSCClient.stop(220): Exception while waiting for
> end session reply.
> java.util.concurrent.TimeoutException
> The Livy call trace looks like it is trying:
> -> repl/ProcessInterpreter.scala close() - Yarn log
> showing "Shutting down process"
> -> repl/PythonInterpreter.scala sendShutdownRequest()
> -> livy/server/interactive/InteractiveSession.scala stopSession()
> -> livy/rsc/RSCClient.java stop() - the
> client getting the timeout error:
> livy.rsc.RSCClient.stop(220): Exception while waiting
> for end session reply
> Not sure what happened, it appears that the client lost its reference to its
> session/ProcessInterpreter and can no longer complete a session close attempt.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)