Sometimes task keeps on running while its Syslog says that it is shutdown
-------------------------------------------------------------------------

                 Key: MAPREDUCE-3058
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3058
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 0.23.0
            Reporter: Karam Singh


While running GridMix V3 one got stuck for 15 hrs.
After clicking on Job found one of its reduce got stuck.
Looking at syslog of reducer it found that- :
Task started at -: 
2011-09-19 17:57:22,002 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Scheduled snapshot period at 10 second(s).
2011-09-19 17:57:22,002 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ReduceTask metrics system started

While end of syslog says -:
2011-09-19 18:06:49,818 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 
as <DATANODE1>
2011-09-19 18:06:49,818 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block BP-1405370709-<NAMENODE>-1316452621953:blk_-7004355226367468317_79871 
in pipeline  <DATANODE2>,  <DATANODE1>: bad datanode  <DATANODE1>
2011-09-19 18:06:49,818 DEBUG 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtocol: 
lastAckedSeqno = 26870
2011-09-19 18:06:49,820 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <NAMENODE> from gridperf sending #454
2011-09-19 18:06:49,826 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <<NAMENODE> from gridperf got value #454
2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.ipc.RPC: Call: 
getAdditionalDatanode 8
2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Connecting to 
datanode <DATANODE2>
2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Send buf size 
131071
2011-09-19 18:06:49,833 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exception
java.io.EOFException: Premature EOF: no length prefix available
        at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
2011-09-19 18:06:49,837 WARN org.apache.hadoop.mapred.YarnChild: Exception 
running child : java.io.EOFException: Premature EOF: no length prefix available
        at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)

2011-09-19 18:06:49,837 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <APPMASTER> from job_1316452677984_0862 sending #455
2011-09-19 18:06:49,839 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <APPMASTER> from job_1316452677984_0862 got value #455
2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 3
2011-09-19 18:06:49,840 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task
2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <NAMENODE> from gridperf sending #456
2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <NAMENODE> from gridperf got value #456
2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.RPC: Call: delete 18
2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <APPMASTER> from job_1316452677984_0862 sending #457
2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
(26613121) connection to <APPMASTER> from job_1316452677984_0862 got value #457
2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.ipc.RPC: Call: 
reportDiagnosticInfo 1
2011-09-19 18:06:49,859 DEBUG 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: refCount=1
2011-09-19 18:06:49,859 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping ReduceTask metrics system...
2011-09-19 18:06:49,859 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping metrics source UgiMetrics
2011-09-19 18:06:49,859 DEBUG 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: class 
org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: 
Unregistering Hadoop:service=ReduceTask,name=UgiMetrics
2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping metrics source JvmMetrics
2011-09-19 18:06:49,860 DEBUG 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: class 
org.apache.hadoop.metrics2.source.JvmMetrics
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: 
Unregistering Hadoop:service=ReduceTask,name=JvmMetrics
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: 
Unregistering Hadoop:service=ReduceTask,name=MetricsSystem,sub=Stats
2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ReduceTask metrics system stopped.
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: 
Unregistering Hadoop:service=ReduceTask,name=MetricsSystem,sub=Control
2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ReduceTask metrics system shutdown complete.

Which means tasks is stopped withing 20 secs
Whereas acutally tasks kept on running for more 15 hrs.
>From AM log. also found tasks was sending its update regularly 
ps -ef | grep java was also showing task is running


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to