[
https://issues.apache.org/jira/browse/TEZ-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313077#comment-14313077
]
Bikas Saha commented on TEZ-2045:
---------------------------------
How can the AMContainer be in any state other than IDLE when a new task has
been assigned to it? The scheduler will be assigning it a new task only when
the previous task has reported completion. This transition is being used in
multiple places because its catching this same multiple assign TA error.
However, by the time we are done with the error checks earlier in the
transition, shouldn't we be asserting with a Precondition that we should be in
IDLE state at this point.
{code} container.registerAttemptWithListener(amContainerTask);
container.additionalLocalResources = null;
container.credentialsChanged = false;
if (container.getState() == AMContainerState.IDLE) {
return AMContainerState.RUNNING;
} else {
return container.getState();
}{code}
Does NodeFailedBaseTransition need a call to
container.unregisterFromTAListener()?
testMultipleAllocationsAtIdle() and testAllocationAtRunning() should still be
valid but looks like they have been removed? New test
testMultipleAllocationsWhileActive() seems to cover only
testAllocationAtRunning().
testNodeFailedAtIdle() has been replaced by testNodeFailedAtRunning()?
testContainerCompletedAtIdle() shouldn't have needed modification because the
behavior is still the same. The new code is still sending out container
terminated requests for current attempt that were sent for pending attempt
earlier
Not sure how testGetTaskMultiplePulls() is simulating container reuse without
actually creating another AMContainerTask and making sure both are sent with
the expected flow. Are there any negative cases in this flow where multiple
getTasks should not actually happen? Eg. getTask being called when the task has
not finished yet.
Earlier, getTask() was essentially reaching into the AMContainer and exercising
its state machine via C_PULL_TA. So any incorrect state would result in an
error. But now getTask() is simply returning the AMContainerTask registered at
some time in time without checking what the current AMContainer state is. How
can we be sure that its not performing an inconsistent operation and sending a
remotePull event to the TaskAttempt?
Essentially, if the running of the task is taken over by the
TaskAttemptListener, then shouldn't the AMContainer only represent the presence
of running executor. Which would imply that there is no need for an IDLE state.
It could just move from LAUNCHING to RUNNING. The RUNNING state can also error
out when another attempt is assigned while there is a current attempt already
assigned.
In the context of external services. Will the TaskAttemptListener expose
AMContainerTask to external services implementing TaskAtttemptListener?
> TaskAttemptListener should not pull Tasks from AMContainer
> ----------------------------------------------------------
>
> Key: TEZ-2045
> URL: https://issues.apache.org/jira/browse/TEZ-2045
> Project: Apache Tez
> Issue Type: Task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-2045.1.txt, TEZ-2045.2.txt
>
>
> The flow should be reversed where AMContainer registers with the
> TaskAttemptListener, like it's done for all other operations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)