[ 
https://issues.apache.org/jira/browse/TEZ-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317224#comment-14317224
 ] 

Bikas Saha commented on TEZ-2082:
---------------------------------

This is likely a race condition introduced in TEZ-2045 and hence I am removing 
the 0.6.1 target version and reducing priority.
Explanation below. /cc [~sseth]

In TaskAttemptListenerImpTezDag.getTask(TaskAttemptListenerImpTezDag.java)
{code}
=== registeredContainers returns true here==== 
      if (!registeredContainers.containsKey(containerId)) {
        if(context.getAllContainers().get(containerId) == null) {
          LOG.info("Container with id: " + containerId
              + " is invalid and will be killed");
        } else {
          LOG.info("Container with id: " + containerId
              + " is valid, but no longer registered, and will be killed");
        }
        task = TASK_FOR_INVALID_JVM;
      } else {
        pingContainerHeartbeatHandler(containerId);
=== registeredContainers return null for the same cId inside getContainerTask 
===
=== so it returns TASK_FOR_INVALID_JVM but code only checks for null ====
        task = getContainerTask(containerId);
        if (task == null) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("No task current assigned to Container with id: " + 
containerId);
          }
        } else {
            context.getEventHandler().handle(
=== so it crashes here while accessing getTaskSpec().getTaskAttemptID() since 
that is null for TASK_FOR_INVALID_JVM ===
                new TaskAttemptEventStartedRemotely(task.getTaskSpec()
                    .getTaskAttemptID(), containerId, context
                    .getApplicationACLs()));
            LOG.info("Container with id: " + containerId + " given task: "
                + task.getTaskSpec().getTaskAttemptID());
        }
      }{code}

Can't think of anyway to test for this race condition. So added a precondition 
that will help catch this more easily if it occurs again.

> Failing test: TestPreemption::testPreemptionWithSession/
> --------------------------------------------------------
>
>                 Key: TEZ-2082
>                 URL: https://issues.apache.org/jira/browse/TEZ-2082
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hitesh Shah
>            Assignee: Bikas Saha
>         Attachments: TEZ-2082.1.patch
>
>
> From 
> https://builds.apache.org/job/Tez-Build/891/testReport/junit/org.apache.tez.dag.app/TestPreemption/testPreemptionWithSession/
> Exception in thread "Thread-27" java.lang.NullPointerException
>       at 
> org.apache.tez.dag.app.TaskAttemptListenerImpTezDag.getTask(TaskAttemptListenerImpTezDag.java:222)
>       at 
> org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.run(MockDAGAppMaster.java:230)
>       at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to