[jira] [Commented] (TEZ-3910) Single node can cause Tez job to fail during shuffle

JoeXu (Jira) Tue, 01 Jul 2025 22:20:06 -0700


    [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987322#comment-17987322
 ]


JoeXu commented on TEZ-3910:
----------------------------

Hi [~jeagles] 


Thanks for sharing the details. Could you share any information on why this 
Jira has remained unresolved for so long? 
It looks like the patch broke some tests, is there something blocking the patch?

> Single node can cause Tez job to fail during shuffle
> ----------------------------------------------------
>
>                 Key: TEZ-3910
>                 URL: https://issues.apache.org/jira/browse/TEZ-3910
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-3910.001.patch, TEZ-3910.002.patch, 
> TEZ-3910.003.patch, TEZ-3910.004.patch, TEZ-3910.005.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TEZ-3910) Single node can cause Tez job to fail during shuffle

Reply via email to