[
https://issues.apache.org/jira/browse/TEZ-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421761#comment-15421761
]
Zhiyuan Yang commented on TEZ-3397:
-----------------------------------
Close this for now. Currently if a destination task keep reporting error for
more than a time limit, the re-execution of source task will be performed. This
is good enough for now.
> Better fault tolerance heuristics for custom edge
> -------------------------------------------------
>
> Key: TEZ-3397
> URL: https://issues.apache.org/jira/browse/TEZ-3397
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Zhiyuan Yang
> Assignee: Zhiyuan Yang
>
> Today, a source task calculates failure fraction by dividing number of unique
> destination tasks that report failure by number of destination tasks that
> depend on this source task. A better way is to divide number of destination
> tasks that report failure by number of *unfinished* destination tasks that
> depend on the source task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)