[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616635#comment-13616635
 ] 

Jason Lowe commented on MAPREDUCE-5114:
---------------------------------------

Stack trace from second AM attempt:

{noformat}
2013-03-27 02:25:48,995 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot 
be cast to org.apache.hadoop.mapreduce.jobhistory.Event
        at 
org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:87)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.readJustAMInfos(MRAppMaster.java:1042)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:964)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1271)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1267)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1226)
2013-03-27 02:25:48,998 INFO [Thread-1] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. 
Signaling RMCommunicator and JobHistoryEventHandler.
{noformat}

Unfortunately the first attempt's history file had already been deleted by the 
time I noticed this error, so I don't have the offending input data for 
reference.  However we can see from the backtrace that Avro can throw 
exceptions other than IOException when parsing history files, and that 
shouldn't cause the AM attempt to fail outright.  Instead it should warn that 
the prior AM attempt information cannot be retrieved then move on, just as it 
does for IOException.
                
> Subsequent AM attempt can crash trying to read prior AM attempt information
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5114
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.6
>            Reporter: Jason Lowe
>
> Saw the second AM attempt of a job fail early during startup because it tried 
> to read the AMInfos from the previous attempt's history file and hit an error 
> that wasn't an IOException.  Stack trace to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to