[
https://issues.apache.org/jira/browse/MAPREDUCE-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Payne updated MAPREDUCE-4937:
----------------------------------
Attachment: MAPREDUCE-4937.MRAMHandlOversizeSplits.txt
Thank you for the comments, Jason.
bq. ◾Instead of duplicating the super.serviceStart() and having a mid-method
return, we could do something like this:
I rearranged the code as suggested, with one adjustment. Rather than setting
initFailed to true if internal state is NEW, initFailed is set to true if
internal state is not INITED.
bq. ◾Nit: JOB_INIT_FAILED should be listed by the other events produced by
MRAppMaster (i.e.: JOB_INIT, JOB_START) so like the other events it's
documented where the event originates
Moved JOB_INIT_FAILED
bq. ◾Nit: would be nice to have some better indentation on the wrapped lines in
the test case, since those lines have smaller indents than the method
declaration
Fixed indentation
> MR AM handles an oversized split metainfo file poorly
> -----------------------------------------------------
>
> Key: MAPREDUCE-4937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4937
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.2-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Eric Payne
> Attachments: MAPREDUCE-4937.MRAMHandlOversizeSplits.txt,
> MAPREDUCE-4937.MRAMHandlOversizeSplits.txt,
> MAPREDUCE-4937.MRAMHandlOversizeSplits.txt
>
>
> When an job runs with a split metainfo file that's larger than it has been
> configured to handle then it just crashes. This leaves the user with a
> less-than-ideal debug session since there are no useful diagnostic messages
> sent to the client for this failure. In addition it crashes before
> registering/unregistering with the RM and crashes without generating history,
> so the proxy URL is not very useful and there's no archived configuration to
> check to see what setting the AM was using when it encountered the error.
> The AM should handle this error case more gracefully and treat the failure as
> it does any other failed job, with a proper unregistration from the RM and
> with history.
--
This message was sent by Atlassian JIRA
(v6.2#6252)