[jira] [Commented] (MAPREDUCE-6457) kill task before the task is truely runing on a slave node

Wei Chen (JIRA) Wed, 19 Aug 2015 12:05:21 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703579#comment-14703579
 ]


Wei Chen commented on MAPREDUCE-6457:
-------------------------------------

2015-08-19 11:52:27,982 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: receive kill
commandtask_1440006609204_0002_m_000002
2015-08-19 11:52:27,982 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:2 Task Transitioned
from SCHEDULED to KILL_WAIT
2015-08-19 11:52:27,982 INFO [ContainerLauncher #2]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container
container_1440006609204_0002_01_000003 taskAttempt
attempt_1440006609204_0002_m_000001_0
2015-08-19 11:52:27,982 INFO [ContainerLauncher #2]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Launching attempt_1440006609204_0002_m_000001_0
2015-08-19 11:52:27,982 INFO [ContainerLauncher #2]
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
Opening proxy : jassion-OptiPlex-780:33640
2015-08-19 11:52:27,983 INFO [ContainerLauncher #2]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: launch
remote task
onjassion-OptiPlex-780:33640fromattempt_1440006609204_0002_m_000001_0
2015-08-19 11:52:28,277 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: one time for
request containers
2015-08-19 11:52:28,278 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: container
on*priority:20number of containers0
2015-08-19 11:52:28,278 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: container
on/default-rackpriority:20number of containers0
2015-08-19 11:52:28,278 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: container
onjassion-OptiPlex-780priority:20number of containers0
2015-08-19 11:52:28,278 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources()
for application_1440006609204_0002: ask=3 release= 0 newContainers=0
finishedContainers=0 resourcelimit=<memory:4096, vCores:-2> knownNMs=1
2015-08-19 11:52:28,312 INFO [ContainerLauncher #2]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle
port returned by ContainerManager for attempt_1440006609204_0002_m_000001_0
: 13562
2015-08-19 11:52:28,358 FATAL [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.NullPointerException
        at
java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
        at
java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:1016)
        at java.util.Collections$SetFromMap.contains(Collections.java:3901)
        at
org.apache.hadoop.mapred.TaskAttemptListenerImpl.unregister(TaskAttemptListenerImpl.java:500)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$CleanupContainerTransition.transition(TaskAttemptImpl.java:1966)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$CleanupContainerTransition.transition(TaskAttemptImpl.java:1958)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1073)
        at
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1311)
        at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1303)
        at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:745)
2015-08-19 11:52:28,853 INFO [AsyncDispatcher event handler]
org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

These are the log files I got.  I just try to kill one task
task_1440006609204_0002_m_000002,  this task is just during  SCHED phase
not RUN yet.  After it's killed, then come to the exception.

The problem maybe due to the task is trying unregister from
TaskAttemptListenerImpl while it's not register witl
TaskAttemptListenerImpl yet.



On Wed, Aug 19, 2015 at 11:38 AM, Karthik Kambatla (JIRA) <[email protected]>



> kill task before the task is truely runing on a slave node
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-6457
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6457
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 2.6.0, 2.7.1
>         Environment: regular environment
>            Reporter: Wei Chen
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I happned to find this bug. When I am doing some development based on 
> Hadoop-2.6.0.  I happend to send to T_KILL signal to one of maptask before 
> this task is register to TaskAttemptListener. Because the attempt is not 
> assigned with a container yet. It may lead to a lot of errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6457) kill task before the task is truely runing on a slave node

Reply via email to