[
https://issues.apache.org/jira/browse/TEZ-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741746#comment-14741746
]
Bikas Saha edited comment on TEZ-2798 at 9/11/15 11:55 PM:
-----------------------------------------------------------
I investigated this.
The context passed to containerlaunchercontext is null because its incorrectly
passed when the context object is null in the mockdagappmaster constructor.
Whenever the launchercontext methods are invoked they NPE on its context
member. So when the mockAM launches the mockContainer, there is NPE and the
container stays in launching state.
TEZ-2045 reversed the flow of sending taskspec to the communicator. This ends
up with the side effect that that container lifecycle becomes disconnected from
task lifecycle. Even if the container is in launching state, the rest of the
task state machine can proceed because there are no further interactions with
the AMcontainer object after that (in the no-error case).
After the task completes, the local scheduler releases the container and the
AMcontainer transitions from Launching to stopped. Again it NPEs when the
stop() callback is called. But the rest of the AM code/tests pass.
NPE are not crashing the AM because AsyncDispatcher error on exit is set to
false. Actually NPE should not be reaching the asyncdispatcher because the
containerlaunchermanager should catch exception thrown from service plugin when
invoking their methods. In this case, containerlaunchermanager should have
caught the exception in plugin.launchContainer() invocation. However, none of
the plugin API's actually throw an exception. So the framework code does not
catch that exception and we end up ignoring errors. Creating TEZ-2815 to track
that.
{code}java.lang.NullPointerException
at
org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
at
org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
at
org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
at
org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200){code}
was (Author: bikassaha):
I investigated this.
The context passed to containerlaunchercontext is null because its incorrectly
passed when the context object is null in the mockdagappmaster constructor.
Whenever the launchercontext methods are invoked they NPE on its context
member. So when the mockAM launches the mockContainer, there is NPE and the
container stays in launching state.
TEZ-2045 reversed the flow of sending taskspec to the communicator. This ends
up with the side effect that that container lifecycle becomes disconnected from
task lifecycle. Even if the container is in launching state, the rest of the
task state machine can proceed because there are no further interactions with
the AMcontainer object after that (in the no-error case).
After the task completes, the local scheduler releases the container and the
AMcontainer transitions from Launching to stopped. Again it NPEs when the
stop() callback is called. But the rest of the AM code/tests pass.
NPE are not crashing the AM because AsyncDispatcher error on exit is set to
false. Actually NPE should not be reaching the asyncdispatcher because the
containerlaunchermanager should catch exception thrown from service plugin when
invoking their methods. In this case, containerlaunchermanager should have
caught the exception in plugin.launchContainer() invocation. However, none of
the plugin API's actually throw an exception. So the framework code does not
catch that exception and we end up ignoring errors. Creating a jira to track
that.
{code}java.lang.NullPointerException
at
org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
at
org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
at
org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
at
org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200){code}
> NPE when executing TestMemoryWithEvents::testMemoryScatterGather
> ----------------------------------------------------------------
>
> Key: TEZ-2798
> URL: https://issues.apache.org/jira/browse/TEZ-2798
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Priority: Blocker
> Fix For: 0.8.1
>
>
> {noformat}
> 2015-09-10 05:07:45,885 ERROR [Dispatcher thread: Central]
> common.AsyncDispatcher (AsyncDispatcher.java:dispatch(188)) - Error in
> dispatcher thread
> java.lang.NullPointerException
> at
> org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
> at
> org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
> at
> org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
> at
> org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200)
> at
> org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:46)
> at
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Wasn't caught in jenkins as these tests are very long running tests and are
> marked as @Ignore (mainly for internal testing).
> Same exception with testMemoryBroadcast, testMemoryOneToOne,
> testMemoryRootInputEvents
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)