[ https://issues.apache.org/jira/browse/MAPREDUCE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490348#comment-17490348 ]
Raman Chodźka commented on MAPREDUCE-4950: ------------------------------------------ I am also experiencing this same issue. The culprit seems to be an exception which happens earlier. For example, in my case there is an exception thrown inside eventHandlingThread in org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: {code} 2022-02-10 12:21:58,913 ERROR [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing History Event: org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinishedEvent@5da2cfca java.io.IOException: All datanodes [DatanodeInfoWithStorage[195.201.110.185:50010,DS-fe52ee42-b47a-4ad1-8d4c-8400d6c95b18,DISK]] are bad. Aborting... at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1537) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1472) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1244) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:663) {code} The exception is thrown in eventHandlingThread when writing to EventWriter which sends event to DatumWriter<Event> writer. JsonEncoder is used along with DatumWriter. JsonEncoder uses Parser for some kind of validation during serialization. Apparently the aforementioned IOException leaves Parser in an invalid state (also eventHandlingThread, probably, finishes execution). Finally, when all tasks are complete, JobHistoryEventHandler in serviceStop() tries to write an event via EventWriter which results in {code} 2022-02-10 12:21:58,994 WARN [Thread-71] org.apache.hadoop.service.CompositeService: When stopping the service JobHistoryEventHandler : org.apache.avro.AvroTypeException: Attempt to process a enum when a item-end was expected. org.apache.avro.AvroTypeException: Attempt to process a enum when a item-end was expected. at org.apache.avro.io.parsing.Parser.advance(Parser.java:93) at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:234) at org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:59) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:95) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:1607) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:645) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:443) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:222) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:104) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1855) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:222) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1293) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:653) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:732) {code} In my case I increased replication factor from 1 to 2 (I have such a small replication factor because those datanodes belong to a QA environment) which made "IOException: All datanodes .., are bad." error less likely. One might also try setting {{mapreduce.jobhistory.jhist.format}} to {{binary}} since BinaryEncoder doesn't seem to perform validation during serialization. But I didn't check whether it works. Even if it does, if an exception is thrown during writing an event to hdfs, the event might end up being partially written potentially leaving events file in corrupt state. > MR App Master fails to write the history due to AvroTypeException > ----------------------------------------------------------------- > > Key: MAPREDUCE-4950 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4950 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mr-am > Reporter: Devaraj Kavali > Priority: Critical > > {code:xml} > 2013-01-19 19:31:27,269 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, > writing event MAP_ATTEMPT_STARTED > 2013-01-19 19:31:27,269 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.service.CompositeService: Error stopping > JobHistoryEventHandler > org.apache.avro.AvroTypeException: Attempt to process a enum when a > array-start was expected. > at org.apache.avro.io.parsing.Parser.advance(Parser.java:93) > at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:210) > at > org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:54) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) > at > org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) > at > org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:66) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:825) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:517) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.stop(JobHistoryEventHandler.java:346) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:445) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:406) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > 2013-01-19 19:31:27,271 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory > hdfs://hacluster /root/staging-dir/root/.staging/job_1358603069474_0135 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org