Sometimes task keeps on running while its Syslog says that it is shutdown
-------------------------------------------------------------------------
Key: MAPREDUCE-3058
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3058
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Karam Singh
While running GridMix V3 one got stuck for 15 hrs.
After clicking on Job found one of its reduce got stuck.
Looking at syslog of reducer it found that- :
Task started at -:
2011-09-19 17:57:22,002 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Scheduled snapshot period at 10 second(s).
2011-09-19 17:57:22,002 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
ReduceTask metrics system started
While end of syslog says -:
2011-09-19 18:06:49,818 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
as <DATANODE1>
2011-09-19 18:06:49,818 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block BP-1405370709-<NAMENODE>-1316452621953:blk_-7004355226367468317_79871
in pipeline <DATANODE2>, <DATANODE1>: bad datanode <DATANODE1>
2011-09-19 18:06:49,818 DEBUG
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtocol:
lastAckedSeqno = 26870
2011-09-19 18:06:49,820 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <NAMENODE> from gridperf sending #454
2011-09-19 18:06:49,826 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <<NAMENODE> from gridperf got value #454
2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.ipc.RPC: Call:
getAdditionalDatanode 8
2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Connecting to
datanode <DATANODE2>
2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Send buf size
131071
2011-09-19 18:06:49,833 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception
java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
2011-09-19 18:06:49,837 WARN org.apache.hadoop.mapred.YarnChild: Exception
running child : java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
2011-09-19 18:06:49,837 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <APPMASTER> from job_1316452677984_0862 sending #455
2011-09-19 18:06:49,839 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <APPMASTER> from job_1316452677984_0862 got value #455
2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 3
2011-09-19 18:06:49,840 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
for the task
2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <NAMENODE> from gridperf sending #456
2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <NAMENODE> from gridperf got value #456
2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.RPC: Call: delete 18
2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <APPMASTER> from job_1316452677984_0862 sending #457
2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(26613121) connection to <APPMASTER> from job_1316452677984_0862 got value #457
2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.ipc.RPC: Call:
reportDiagnosticInfo 1
2011-09-19 18:06:49,859 DEBUG
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: refCount=1
2011-09-19 18:06:49,859 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping ReduceTask metrics system...
2011-09-19 18:06:49,859 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping metrics source UgiMetrics
2011-09-19 18:06:49,859 DEBUG
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: class
org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans:
Unregistering Hadoop:service=ReduceTask,name=UgiMetrics
2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping metrics source JvmMetrics
2011-09-19 18:06:49,860 DEBUG
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: class
org.apache.hadoop.metrics2.source.JvmMetrics
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans:
Unregistering Hadoop:service=ReduceTask,name=JvmMetrics
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans:
Unregistering Hadoop:service=ReduceTask,name=MetricsSystem,sub=Stats
2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
ReduceTask metrics system stopped.
2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans:
Unregistering Hadoop:service=ReduceTask,name=MetricsSystem,sub=Control
2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
ReduceTask metrics system shutdown complete.
Which means tasks is stopped withing 20 secs
Whereas acutally tasks kept on running for more 15 hrs.
>From AM log. also found tasks was sending its update regularly
ps -ef | grep java was also showing task is running
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira