[ 
https://issues.apache.org/jira/browse/TEZ-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313077#comment-14313077
 ] 

Bikas Saha commented on TEZ-2045:
---------------------------------

How can the AMContainer be in any state other than IDLE when a new task has 
been assigned to it? The scheduler will be assigning it a new task only when 
the previous task has reported completion. This transition is being used in 
multiple places because its catching this same multiple assign TA error. 
However, by the time we are done with the error checks earlier in the 
transition, shouldn't we be asserting with a Precondition that we should be in 
IDLE state at this point.
{code}      container.registerAttemptWithListener(amContainerTask);
      container.additionalLocalResources = null;
      container.credentialsChanged = false;
      if (container.getState() == AMContainerState.IDLE) {
        return AMContainerState.RUNNING;
      } else {
        return container.getState();
      }{code}

Does NodeFailedBaseTransition need a call to 
container.unregisterFromTAListener()?

testMultipleAllocationsAtIdle() and testAllocationAtRunning() should still be 
valid but looks like they have been removed? New test 
testMultipleAllocationsWhileActive() seems to cover only 
testAllocationAtRunning().

testNodeFailedAtIdle() has been replaced by testNodeFailedAtRunning()?

testContainerCompletedAtIdle() shouldn't have needed modification because the 
behavior is still the same. The new code is still sending out container 
terminated requests for current attempt that were sent for pending attempt 
earlier

Not sure how testGetTaskMultiplePulls() is simulating container reuse without 
actually creating another AMContainerTask and making sure both are sent with 
the expected flow. Are there any negative cases in this flow where multiple 
getTasks should not actually happen? Eg. getTask being called when the task has 
not finished yet.

Earlier, getTask() was essentially reaching into the AMContainer and exercising 
its state machine via C_PULL_TA. So any incorrect state would result in an 
error. But now getTask() is simply returning the AMContainerTask registered at 
some time in time without checking what the current AMContainer state is. How 
can we be sure that its not performing an inconsistent operation and sending a 
remotePull event to the TaskAttempt?

Essentially, if the running of the task is taken over by the 
TaskAttemptListener, then shouldn't the AMContainer only represent the presence 
of running executor. Which would imply that there is no need for an IDLE state. 
It could just move from LAUNCHING to RUNNING. The RUNNING state can also error 
out when another attempt is assigned while there is a current attempt already 
assigned.

In the context of external services. Will the TaskAttemptListener expose 
AMContainerTask to external services implementing TaskAtttemptListener?


> TaskAttemptListener should not pull Tasks from AMContainer
> ----------------------------------------------------------
>
>                 Key: TEZ-2045
>                 URL: https://issues.apache.org/jira/browse/TEZ-2045
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-2045.1.txt, TEZ-2045.2.txt
>
>
> The flow should be reversed where AMContainer registers with the 
> TaskAttemptListener, like it's done for all other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to