[
https://issues.apache.org/jira/browse/TEZ-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413834#comment-15413834
]
Ming Ma commented on TEZ-3397:
------------------------------
Is that only for specific custom edge scenario or it applies to SCATTER_GATHER
as well?
Wonder if this will increase the likelihood of false positive. For example,
source vertex's {{TaskAttemptImpl}} has the list of destination tasks that have
complained so far, some of which were due to network issue a while back; others
might have succeeded since. Then the source task attempt gets another complaint
from a new destination task close to the end of destination vertex completion
(thus few unfinished destination tasks), this new heuristics could mark the
source task bad, while the actual issue is from destination task.
Another thing is how to test such heuristics change, if it is based on some
sort of simulation.
> Better fault tolerance heuristics for custom edge
> -------------------------------------------------
>
> Key: TEZ-3397
> URL: https://issues.apache.org/jira/browse/TEZ-3397
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Zhiyuan Yang
> Assignee: Zhiyuan Yang
>
> Today, a source task calculates failure fraction by dividing number of unique
> destination tasks that report failure by number of destination tasks that
> depend on this source task. A better way is to divide number of destination
> tasks that report failure by number of *unfinished* destination tasks that
> depend on the source task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)