[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490348#comment-17490348
 ] 

Raman Chodźka commented on MAPREDUCE-4950:
------------------------------------------

I am also experiencing this same issue.
The culprit seems to be an exception which happens earlier. For example, in my 
case there is an exception thrown inside eventHandlingThread in 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
{code}
2022-02-10 12:21:58,913 ERROR [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing 
History Event: 
org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinishedEvent@5da2cfca
java.io.IOException: All datanodes 
[DatanodeInfoWithStorage[195.201.110.185:50010,DS-fe52ee42-b47a-4ad1-8d4c-8400d6c95b18,DISK]]
 are bad. Aborting...
        at 
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1537)
        at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1472)
        at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1244)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:663)
{code}
The exception is thrown in eventHandlingThread when writing to EventWriter 
which sends event to DatumWriter<Event> writer. JsonEncoder is used along with 
DatumWriter. JsonEncoder uses Parser for some kind of validation during 
serialization.
Apparently the aforementioned IOException leaves Parser in an invalid state 
(also eventHandlingThread, probably, finishes execution).

Finally, when all tasks are complete, JobHistoryEventHandler in serviceStop() 
tries to write an event via EventWriter which results in 

{code}
2022-02-10 12:21:58,994 WARN [Thread-71] 
org.apache.hadoop.service.CompositeService: When stopping the service 
JobHistoryEventHandler : org.apache.avro.AvroTypeException: Attempt to process 
a enum when a item-end was expected.
org.apache.avro.AvroTypeException: Attempt to process a enum when a item-end 
was expected.
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:93)
        at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:234)
        at 
org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:59)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67)
        at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at 
org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:95)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:1607)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:645)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:443)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:222)
        at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
        at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:104)
        at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
        at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1855)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:222)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1293)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:653)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:732)
{code}

In my case I increased replication factor from 1 to 2 (I have such a small 
replication factor because those datanodes belong to a QA environment) which 
made "IOException: All datanodes .., are bad." error less likely.

One might also try setting {{mapreduce.jobhistory.jhist.format}} to {{binary}} 
since BinaryEncoder doesn't seem to perform validation during serialization. 
But I didn't check whether it works. Even if it does, if an exception is thrown 
during writing an event to hdfs, the event might end up being partially written 
potentially leaving events file in corrupt state.


> MR App Master fails to write the history due to AvroTypeException
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4950
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4950
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, mr-am
>            Reporter: Devaraj Kavali
>            Priority: Critical
>
> {code:xml}
> 2013-01-19 19:31:27,269 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, 
> writing event MAP_ATTEMPT_STARTED
> 2013-01-19 19:31:27,269 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.service.CompositeService: Error stopping 
> JobHistoryEventHandler
> org.apache.avro.AvroTypeException: Attempt to process a enum when a 
> array-start was expected.
>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:93)
>       at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:210)
>       at 
> org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:54)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
>       at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:66)
>       at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:825)
>       at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:517)
>       at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.stop(JobHistoryEventHandler.java:346)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:445)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:406)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>       at java.lang.Thread.run(Thread.java:662)
> 2013-01-19 19:31:27,271 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory 
> hdfs://hacluster /root/staging-dir/root/.staging/job_1358603069474_0135
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to