[ 
https://issues.apache.org/jira/browse/SPARK-49762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuangxian updated SPARK-49762:
-------------------------------
    Affects Version/s: 2.4.1
                           (was: 3.5.1)

> How to handling Task Timeouts and Placeholder Allocation in Spark Shuffle 
> Write Phase
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-49762
>                 URL: https://issues.apache.org/jira/browse/SPARK-49762
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>    Affects Versions: 2.4.1
>            Reporter: zhuangxian
>            Priority: Major
>
> During the Spark shuffle write phase, the driver initiates a task to write a 
> partition and has allocated a placeholder for the commit to that task. 
> However, when dealing with a large volume of data, the task may fail to 
> complete the commit task due to network issues or disk failures. In such 
> cases, how should the driver detect the task timeout and launch a new task to 
> commit the task for the same partition? Additionally, starting a new task 
> raises the following issues: 1.Since the placeholder is occupied by the old 
> task, the new task cannot obtain the placeholder for submission. How should 
> the new task be allocated a placeholder? 2.How can the old task exit safely 
> to ensure it does not commit the same data as the new task?
> The commit protocol is the 2PC. The main process is 
> placeholder->move->commit, 
> [github.com/apache/spark/blob/master/core/src/main/scala/org/…|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala]
>  here is the specific implementation.
> And the commit algorithm I used is v2.
> I tried searching in the history but could not find a solution to this 
> problem. I look forward to discussing this issue with community members.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to