[ 
https://issues.apache.org/jira/browse/TEZ-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165048#comment-17165048
 ] 

Mustafa Iman commented on TEZ-4206:
-----------------------------------

[~abstractdog] yes, silly me I did not check if there was an existing issue. 
According to your comment here 
https://issues.apache.org/jira/browse/TEZ-4119?focusedCommentId=17024431&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17024431
 it is likely the same issue. I tried to explain this in a comment in the 
patch. The issue comes from these two factors:
 # We use a mock clock advancing 1 second at each tick
 # LegacySpeculator artificially increases cool off period if evaluation itself 
takes long time. See (clock.getTime() - backgroundRunStartTime) at 
[https://github.com/apache/tez/blob/2d7c60849adf3ed62f36f00e161c5d55962206f5/tez-dag/src/main/java/org/apache/tez/dag/app/dag/speculation/legacy/LegacySpeculator.java#L256]

If mock clock tick(1 second) happens while computeSpeculations is in progress, 
speculator thinks it takes 1 second to run computeSpeculations. Therefore waits 
1 second before the second attempt. The problem is that, the original task in 
the test completes before speculator has the second chance to speculate the 
task.

There is no recent work on TEZ-4119. I think we can merge this and close 
TEZ-4119 as duplicate.

> TestSpeculation.testBasicSpeculationPerVertexConf is flaky
> ----------------------------------------------------------
>
>                 Key: TEZ-4206
>                 URL: https://issues.apache.org/jira/browse/TEZ-4206
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: TEZ-4206.1.patch
>
>
> Test is flaky due to timing issue in MockDAGAppMaster's clock and 
> LegacySpeculator
> [https://builds.apache.org/job/PreCommit-TEZ-Build/491/]
> [https://builds.apache.org/job/PreCommit-TEZ-Build/492/]
> [https://builds.apache.org/job/PreCommit-TEZ-Build/493/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to