[
https://issues.apache.org/jira/browse/LIVY-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyorgy Gal updated LIVY-324:
----------------------------
Fix Version/s: 0.10.0
(was: 0.9.0)
This issue has been moved to the 0.10.0 release as part of a bulk update. If
you feel this is moved out inappropriately, feel free to provide justification
and reset the Fix Version to 0.9.0.
> RSCClient can loose its session/interpreter reference leaking job sessions
> --------------------------------------------------------------------------
>
> Key: LIVY-324
> URL: https://issues.apache.org/jira/browse/LIVY-324
> Project: Livy
> Issue Type: Bug
> Components: REPL, RSC
> Affects Versions: 0.3
> Reporter: Pat White
> Priority: Major
> Fix For: 0.10.0
>
>
> Seeing an issue where Livy seems unable to kill a PySpark session due to
> disconnects with its session's ProcessInterpreter.
> User observation, using Hue Notebooks to launch a PySpark session, session is
> initiated and goes to running. User's job has errors and goes to idle but
> session remains running for 24+ hours. Usually we see an idle session
> automatically killed after 1 hour.
> In Yarn task log, we see the AM start ok and SparkContext comes up, user's
> job runs with errors and SparkContext goes to idle, the Yarn job then stays
> idle for 1 hour at which point PythonInterpreter calls shutdown;
> INFO PythonInterpreter: Shutting down process
> Nothing more is seen in the Yarn log, Yarn job remains running.
> In Livy log we see the following timeout exception when trying the shutdown:
> INFO com.cloudera.livy.Logging$class.info(40): Stopping InteractiveSession
> 0...
> WARN com.cloudera.livy.rsc.RSCClient.stop(220): Exception while waiting for
> end session reply.
> java.util.concurrent.TimeoutException
> The Livy call trace looks like it is trying:
> -> repl/ProcessInterpreter.scala close() - Yarn log
> showing "Shutting down process"
> -> repl/PythonInterpreter.scala sendShutdownRequest()
> -> livy/server/interactive/InteractiveSession.scala stopSession()
> -> livy/rsc/RSCClient.java stop() - the
> client getting the timeout error:
> livy.rsc.RSCClient.stop(220): Exception while waiting
> for end session reply
> Not sure what happened, it appears that the client lost its reference to its
> session/ProcessInterpreter and can no longer complete a session close attempt.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)