[jira] [Commented] (TEZ-3397) Better fault tolerance heuristics for custom edge

Ming Ma (JIRA) Tue, 09 Aug 2016 09:55:58 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413834#comment-15413834
 ]


Ming Ma commented on TEZ-3397:
------------------------------

Is that only for specific custom edge scenario or it applies to SCATTER_GATHER 
as well?

Wonder if this will increase the likelihood of false positive. For example, 
source vertex's {{TaskAttemptImpl}} has the list of destination tasks that have 
complained so far, some of which were due to network issue a while back; others 
might have succeeded since. Then the source task attempt gets another complaint 
from a new destination task close to the end of destination vertex completion 
(thus few unfinished destination tasks), this new heuristics could mark the 
source task bad, while the actual issue is from destination task.

Another thing is how to test such heuristics change, if it is based on some 
sort of simulation.

> Better fault tolerance heuristics for custom edge
> -------------------------------------------------
>
>                 Key: TEZ-3397
>                 URL: https://issues.apache.org/jira/browse/TEZ-3397
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>
> Today, a source task calculates failure fraction by dividing number of unique 
> destination tasks that report failure by number of destination tasks that 
> depend on this source task. A better way is to divide number of destination 
> tasks that report failure by number of *unfinished* destination tasks that 
> depend on the source task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3397) Better fault tolerance heuristics for custom edge

Reply via email to