Benoy Antony created ZEPPELIN-2355:
--------------------------------------
Summary: Fix race conditions while cancelling a paragraph
Key: ZEPPELIN-2355
URL: https://issues.apache.org/jira/browse/ZEPPELIN-2355
Project: Zeppelin
Issue Type: Bug
Reporter: Benoy Antony
Assignee: Benoy Antony
I experienced a few issues while testing the cancel functionality for a Livy
paragraph. The tests were performed on a real yarn cluster with Livy running in
cluster modes. On a real cluster, it takes some time to launch application and
start executing the paragraphs. The current cancel function has a few
concurrency issues.
The visible issue was that the user will keep on cancelling initially, but
paragraph will run to the finish.
{code}
@Override
public void cancel(InterpreterContext context) {
if (livyVersion.isCancelSupported()) {
String paraId = context.getParagraphId();
Integer stmtId = paragraphId2StmtIdMap.get(paraId);
try {
if (stmtId != null) {
cancelStatement(stmtId);
}
} catch (LivyException e) {
LOGGER.error("Fail to cancel statement " + stmtId + " for paragraph " +
paraId, e);
} finally {
paragraphId2StmtIdMap.remove(paraId);
}
} else {
LOGGER.warn("cancel is not supported for this version of livy: " +
livyVersion);
}
}
{code}
Issue 1 : The variable livyVersion is set in initLivySession(). The thread
executing cancel may not see the value and hence throw NullPointerException.
Issue 2 : The cancel is a no-op if the statement id is not available. A
significant time (may be up to a minute) may pass before the statement id is
available. The user need to keep cancelling till the statement id is available.
There is no real way for the user to identify when the statement id is
available.
Issue 3: THE SQL paragraph cannot be cancelled. This can be fixed by changing
LivySparkSQLInterpreter to invoke cancel on the underlying LivySparkInterpreter
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)