Mathew created ZEPPELIN-3455:
--------------------------------
Summary: Zeppelin crashes if Spark interpreter is restarted in a
hanging state (And no hang timeout)
Key: ZEPPELIN-3455
URL: https://issues.apache.org/jira/browse/ZEPPELIN-3455
Project: Zeppelin
Issue Type: Bug
Components: Interpreters
Affects Versions: 0.7.3
Reporter: Mathew
Fix For: 0.8.0
*Issue*:
If a user has not kinit'd their keytab, and attempts to run a %spark paragraph,
it will hang indefinitely, and if they then try to restart the Spark
interpreter, the whole of Zeppelin Crashes.
*Enviroment:*
* Zeppelin 0.7.3
* Spark 2.2
* Yarn-Client (Kerberized)
* User Impersonation Enabled (Per user, in isolated process)
* Shiro authenticating users through AD
* ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER=false
* ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '
While this might seem like a crazy setup, it is extremely common in enterprise,
as users have differing permissions in the Hadoop environment.
(I am aware that zeppelin can proxy users if it has its own keytab, but many
Zeppelin users cannot do that for now.)
*Things to Fix:*
* Firstly, there is seemingly no timeout for failing to initialize the Spark
interpreter. (Meaning it hangs forever)
* Secondly, while it is in the hanging state, restarting the Spark interpreter
will crash Zeppelin for everyone. (Sometimes it will come back after a 20+ Min)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)