[ 
https://issues.apache.org/jira/browse/SPARK-25211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590222#comment-16590222
 ] 

Apache Spark commented on SPARK-25211:
--------------------------------------

User 'liutang123' has created a pull request for this issue:
https://github.com/apache/spark/pull/22202

> speculation and fetch failed result in hang of job
> --------------------------------------------------
>
>                 Key: SPARK-25211
>                 URL: https://issues.apache.org/jira/browse/SPARK-25211
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.2
>            Reporter: Lijia Liu
>            Priority: Major
>
> In current `DAGScheduler.handleTaskCompletion` code, when a shuffleMapStage 
> with job not in runningStages and its `pendingPartitions` is empty, the job 
> of this shuffleMapStage will never complete.
> **Think about below**
> 1. Stage 0 runs and generates shuffle output data.
> 2. Stage 1 reads the output from stage 0 and generates more shuffle data. It 
> has two tasks with the same partition: ShuffleMapTask0 and ShuffleMapTask0.1.
> 3. ShuffleMapTask0 fails to fetch blocks and sends a FetchFailed to the 
> driver. The driver resubmits stage 0 and stage 1. The driver will place stage 
> 0 in runningStages and place stage 1 in waitingStages.
> 4. ShuffleMapTask0.1 successfully finishes and sends Success back to driver. 
> The driver will add the mapstatus to the set of output locations of stage 1. 
> because of stage 1 not in runningStages, the job will not complete.
> 5. stage 0 completes and the driver will run stage 1. But, because the output 
> sets of stage 1 is complete, the drive will not submit any tasks and make 
> stage 1 complte right now. Because the job complete relay on the 
> `CompletionEvent` and there will never a `CompletionEvent` come, the job will 
> hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to