[
https://issues.apache.org/jira/browse/TEZ-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148151#comment-14148151
]
Bikas Saha commented on TEZ-1624:
---------------------------------
Below is the original code with some notes.
Couple of comments.
1) The test notification should be happening when the peek() result is empty
and we really are going to wait.
2) The initial peek() at this point seems useless. Could we refactor a bit
because we eventually need to do the sync(this) peek() anyways
3) Does the addition of container to delayedContainers need to move inside the
sync(this) in addDelayedContainers()?
{code}
// Try allocating containers which have timed out.
// Required since these containers may get assigned without
// locality at this point.
if (delayedContainers.peek() == null) {
try {
// test only signaling to make TestTaskScheduler work
if (drainedDelayedContainersForTest != null) { <<<<<
THIS SHOULD HAPPEN BEFORE WE REALLY WAIT
drainedDelayedContainersForTest.set(true);
synchronized (drainedDelayedContainersForTest) {
drainedDelayedContainersForTest.notifyAll();
}
}
synchronized(this) {
this.wait();
}
// Re-loop to see if tryAssignAll is set.
continue;
} catch (InterruptedException e) {
LOG.info("AllocatedContainerManager Thread interrupted");
}
} else {{code}
This fix should have reduced the sleeps in the test or kept them the same and
not increased them :) Could you please revert all the sleep increases and check
again? We would like to avoid making tests run longer.
> Flaky tests in TestContainerReuse
> ---------------------------------
>
> Key: TEZ-1624
> URL: https://issues.apache.org/jira/browse/TEZ-1624
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-1624.1.patch
>
>
> Couple of TestContainerReuse tests are failing due to minor race condition in
> DelayedContainerManager thread.
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
> Mock for TaskAttempt, hashCode: 290467934,
> <any>,
> Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0,
> NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 1,
> Token: null, ]
> );
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:580)
> However, there were other interactions with this mock:
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:531)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:531)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:531)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:532)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:534)
> -> at
> org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:570)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:571)
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
> Mock for TaskAttempt, hashCode: 392638651,
> <any>,
> Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0,
> NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 5,
> Token: null, ]
> );
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:333)
> However, there were other interactions with this mock:
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:289)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:289)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:289)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:290)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:292)
> -> at
> org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:323)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:324)
> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerNotAvailable(TestContainerReuse.java:333)
> org.mockito.exceptions.verification.WantedButNotInvoked:
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
> Mock for TaskAttempt, hashCode: 1830222901,
> <any>,
> Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0,
> NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 3,
> Token: null, ]
> );
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:909)
> However, there were other interactions with this mock:
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:861)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:861)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:861)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:862)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:864)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:900)
> -> at
> org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testReuseAcrossVertices(TestContainerReuse.java:909)
> testDelayedReuseContainerBecomesAvailable(org.apache.tez.dag.app.rm.TestContainerReuse)
> Time elapsed: 0.053 sec <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked:
> Wanted but not invoked:
> taskSchedulerEventHandlerForTest.taskAllocated(
> Mock for TaskAttempt, hashCode: 1829491577,
> <any>,
> Container: [ContainerId: container_1_0001_01_000001, NodeId: host1:0,
> NodeHttpAddress: host1:0, Resource: <memory:1024, vCores:1>, Priority: 5,
> Token: null, ]
> );
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:202)
> However, there were other interactions with this mock:
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:151)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:151)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:151)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:152)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:154)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:191)
> -> at
> org.apache.tez.dag.app.rm.TestContainerReuse.testDelayedReuseContainerBecomesAvailable(TestContainerReuse.java:192)
> -> at
> org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$SetApplicationRegistrationDataCallable.call(TaskSchedulerAppCallbackWrapper.java:244)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)