[jira] [Commented] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

Jonathan Turner Eagles (Jira) Fri, 24 Jan 2020 11:25:24 -0800


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023195#comment-17023195
 ]


Jonathan Turner Eagles commented on MAPREDUCE-7259:
---------------------------------------------------

[~ahussein], thanks for the patch.

There were a number of changes that I don't understand how they contribute to 
making this test less flaky. Could you help me better understand the changes 
and how they contribute to making the test less flaky and more stable.

MRApp. The changes in MRApp are changes not to test code but to runtime code 
that have impact on jobs. Especially as the summary of this jira is that there 
is a flaky test, developers are not expecting changes to non-test code. The 
changes here are three-fold, wait time for jobs is extended from 20 seconds to 
100 seconds, and interval of checking state is reduced from 500 millis to 100 
millis. Lastly there was some refactoring done.

Can you post some analysis that supports these changes helping to make the 
testSpeculateSuccessfulWithUpdateEvents test less flaky? If unrelated, we 
should consider those changes in a separate jira or if related, broaden the 
scope of this jira.


TestSpeculativeExecutionWithMRApp. 

Retry rule. The changes to retry logic seem unrelated to me and should be 
addressed in a separate jira. If required to make 
TestSpeculativeExecutionWithMRApp less flaky, please post supporting analysis.

Controlled clock. ControlledClock already has its own time that allows for 
increment. Could we use Controlled clock api instead of maintaining our own 
clock? This would also make the advanceClock refactoring unnecessary.

Lastly, SUCCESS or KILLED vs SUCCESS. I actually had this test fail for me 
saying TaskAttempt state was KILLED expected SUCCESS. Trying to reproduce to 
get a log to post and analyze.


> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-7259
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>         Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch, 
> MAPREDUCE-7259.003.patch, MAPREDUCE-7259.004.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

Reply via email to