[ 
https://issues.apache.org/jira/browse/TEZ-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421761#comment-15421761
 ] 

Zhiyuan Yang commented on TEZ-3397:
-----------------------------------

Close this for now. Currently if a destination task keep reporting error for 
more than a time limit, the re-execution of source task will be performed. This 
is good enough for now.

> Better fault tolerance heuristics for custom edge
> -------------------------------------------------
>
>                 Key: TEZ-3397
>                 URL: https://issues.apache.org/jira/browse/TEZ-3397
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>
> Today, a source task calculates failure fraction by dividing number of unique 
> destination tasks that report failure by number of destination tasks that 
> depend on this source task. A better way is to divide number of destination 
> tasks that report failure by number of *unfinished* destination tasks that 
> depend on the source task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to