[
https://issues.apache.org/jira/browse/TEZ-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317224#comment-14317224
]
Bikas Saha commented on TEZ-2082:
---------------------------------
This is likely a race condition introduced in TEZ-2045 and hence I am removing
the 0.6.1 target version and reducing priority.
Explanation below. /cc [~sseth]
In TaskAttemptListenerImpTezDag.getTask(TaskAttemptListenerImpTezDag.java)
{code}
=== registeredContainers returns true here====
if (!registeredContainers.containsKey(containerId)) {
if(context.getAllContainers().get(containerId) == null) {
LOG.info("Container with id: " + containerId
+ " is invalid and will be killed");
} else {
LOG.info("Container with id: " + containerId
+ " is valid, but no longer registered, and will be killed");
}
task = TASK_FOR_INVALID_JVM;
} else {
pingContainerHeartbeatHandler(containerId);
=== registeredContainers return null for the same cId inside getContainerTask
===
=== so it returns TASK_FOR_INVALID_JVM but code only checks for null ====
task = getContainerTask(containerId);
if (task == null) {
if (LOG.isDebugEnabled()) {
LOG.debug("No task current assigned to Container with id: " +
containerId);
}
} else {
context.getEventHandler().handle(
=== so it crashes here while accessing getTaskSpec().getTaskAttemptID() since
that is null for TASK_FOR_INVALID_JVM ===
new TaskAttemptEventStartedRemotely(task.getTaskSpec()
.getTaskAttemptID(), containerId, context
.getApplicationACLs()));
LOG.info("Container with id: " + containerId + " given task: "
+ task.getTaskSpec().getTaskAttemptID());
}
}{code}
Can't think of anyway to test for this race condition. So added a precondition
that will help catch this more easily if it occurs again.
> Failing test: TestPreemption::testPreemptionWithSession/
> --------------------------------------------------------
>
> Key: TEZ-2082
> URL: https://issues.apache.org/jira/browse/TEZ-2082
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hitesh Shah
> Assignee: Bikas Saha
> Attachments: TEZ-2082.1.patch
>
>
> From
> https://builds.apache.org/job/Tez-Build/891/testReport/junit/org.apache.tez.dag.app/TestPreemption/testPreemptionWithSession/
> Exception in thread "Thread-27" java.lang.NullPointerException
> at
> org.apache.tez.dag.app.TaskAttemptListenerImpTezDag.getTask(TaskAttemptListenerImpTezDag.java:222)
> at
> org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.run(MockDAGAppMaster.java:230)
> at java.lang.Thread.run(Thread.java:662)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)