Liyin Tang created SPARK-16244:
----------------------------------
Summary: Failed job/stage couldn't stop JobGenerator immediately.
Key: SPARK-16244
URL: https://issues.apache.org/jira/browse/SPARK-16244
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.5.2
Reporter: Liyin Tang
This streaming job has a very simple DAG. Each batch have only 1 job, and each
job has only 1 stage.
Based on the following logs, we observed a potential race condition. Stage 1
failed due to some tasks failure, and it tigers failJobAndIndependentStages.
In the meanwhile, the next stage (job), 2, is submitted and was able to
successfully run a few tasks before stopping JobGenerator via shutdown hook.
Since the next job was able to run through a few tasks successfully, it just
messed up all the checkpoints / offset management.
I will attach the log in the jira as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]