[
https://issues.apache.org/jira/browse/MAPREDUCE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Varun Saxena updated MAPREDUCE-6428:
------------------------------------
Description:
We found that while job succeeds (can be seen as successful in RM), but as
writing of jhist file fails, job cant be seen in JobHistory.
{noformat}
2015-07-04 14:54:37,852 INFO [Thread-94]
org.apache.hadoop.service.AbstractService: Service
org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STOPPED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
All datanodes 9.96.1.171:25009 are bad. Aborting...
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
All datanodes 9.96.1.171:25009 are bad. Aborting...
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad.
Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-07-04 14:54:37,853 WARN [Thread-94]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed. Exiting..
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
All datanodes 9.96.1.171:25009 are bad. Aborting...
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad.
Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-07-04 14:54:37,854 INFO [Thread-94] org.apache.hadoop.util.ExitUtil:
Exiting with status 1
{noformat}
We can probably mark the job as failure and inform RM if this happens. Thoughts
?
was:
We found that while job succeeds (can be seen as successful in RM), but as
writing of jhist file fails, job cant be seen in JobHistory.
{noformat}
2015-07-04 14:54:37,852 INFO [Thread-94]
org.apache.hadoop.service.AbstractService: Service
org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STOPPED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
All datanodes 9.96.1.171:25009 are bad. Aborting...
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
All datanodes 9.96.1.171:25009 are bad. Aborting...
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad.
Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-07-04 14:54:37,853 WARN [Thread-94]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed. Exiting..
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
All datanodes 9.96.1.171:25009 are bad. Aborting...
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad.
Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-07-04 14:54:37,854 INFO [Thread-94] org.apache.hadoop.util.ExitUtil:
Exiting with status 1
{noformat}
We can probably mark the job as failure if this happens. Thoughts ?
> Job History can be lost if there is any issue in writing jhist file while AM
> shuts down job
> -------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6428
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6428
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.4.1
> Reporter: Varun Saxena
> Assignee: Varun Saxena
>
> We found that while job succeeds (can be seen as successful in RM), but as
> writing of jhist file fails, job cant be seen in JobHistory.
> {noformat}
> 2015-07-04 14:54:37,852 INFO [Thread-94]
> org.apache.hadoop.service.AbstractService: Service
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STOPPED;
> cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.io.IOException: All datanodes 9.96.1.171:25009 are bad. Aborting...
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
> All datanodes 9.96.1.171:25009 are bad. Aborting...
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
> Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
> 2015-07-04 14:54:37,853 WARN [Thread-94]
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed.
> Exiting..
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:
> All datanodes 9.96.1.171:25009 are bad. Aborting...
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:546)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:340)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1609)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1107)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:603)
> Caused by: java.io.IOException: All datanodes 9.96.1.171:25009 are bad.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1145)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:926)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
> 2015-07-04 14:54:37,854 INFO [Thread-94] org.apache.hadoop.util.ExitUtil:
> Exiting with status 1
> {noformat}
> We can probably mark the job as failure and inform RM if this happens.
> Thoughts ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)