[ 
https://issues.apache.org/jira/browse/TEZ-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741746#comment-14741746
 ] 

Bikas Saha commented on TEZ-2798:
---------------------------------

I investigated this. 

The context passed to containerlaunchercontext is null because its incorrectly 
passed when the context object is null in the mockdagappmaster constructor. 
Whenever the launchercontext methods are invoked they NPE on its context 
member. So when the mockAM launches the mockContainer, there is NPE and the 
container stays in launching state.

TEZ-2045 reversed the flow of sending taskspec to the communicator. This ends 
up with the side effect that that container lifecycle becomes disconnected from 
task lifecycle. Even if the container is in launching state, the rest of the 
task state machine can proceed because there are no further interactions with 
the AMcontainer object after that (in the no-error case).

After the task completes, the local scheduler releases the container and the 
AMcontainer transitions from Launching to stopped. Again it NPEs when the 
stop() callback is called. But the rest of the AM code/tests pass.

NPE are not crashing the AM because AsyncDispatcher error on exit is set to 
false. Actually NPE should not be reaching the asyncdispatcher because the 
containerlaunchermanager should catch exception thrown from service plugin when 
invoking their methods. In this case, containerlaunchermanager should have 
caught the exception in plugin.launchContainer() invocation. However, none of 
the plugin API's actually throw an exception. So the framework code does not 
catch that exception and we end up ignoring errors. Creating a jira to track 
that.
{code}java.lang.NullPointerException
        at 
org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
        at 
org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
        at 
org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
        at 
org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200){code}

> NPE when executing TestMemoryWithEvents::testMemoryScatterGather
> ----------------------------------------------------------------
>
>                 Key: TEZ-2798
>                 URL: https://issues.apache.org/jira/browse/TEZ-2798
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Blocker
>             Fix For: 0.8.1
>
>
> {noformat}
> 2015-09-10 05:07:45,885 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher (AsyncDispatcher.java:dispatch(188)) - Error in 
> dispatcher thread
> java.lang.NullPointerException
>       at 
> org.apache.tez.dag.app.ContainerLauncherContextImpl.containerLaunched(ContainerLauncherContextImpl.java:47)
>       at 
> org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launch(MockDAGAppMaster.java:280)
>       at 
> org.apache.tez.dag.app.MockDAGAppMaster$MockContainerLauncher.launchContainer(MockDAGAppMaster.java:219)
>       at 
> org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:200)
>       at 
> org.apache.tez.dag.app.launcher.ContainerLauncherManager.handle(ContainerLauncherManager.java:46)
>       at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>       at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Wasn't caught in jenkins as these tests are very long running tests and are 
> marked as @Ignore (mainly for internal testing).
> Same exception with testMemoryBroadcast, testMemoryOneToOne, 
> testMemoryRootInputEvents



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to