[
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805408#comment-13805408
]
Sangjin Lee commented on MAPREDUCE-5186:
----------------------------------------
I am also comfortable with not logging, but I wasn't sure whether the concern
that led to the max split locations is no longer valid (at least in hadoop 2).
I am more concerned about truncating the split locations. I'm not 100% familiar
with this part of the code, but by truncating aren't we dropping some of these
locations on the floor? Then when the splits are run on the cluster, would it
always lead to working splits and mappers? It wasn't clear to me whether it
would always work even if we dropped these on the floor when we write the
splits. I would appreciate if you could shed some insight on this... And yes,
if this is a valid concern, it sounds like it would be a valid concern in MR1
too.
> mapreduce.job.max.split.locations causes some splits created by
> CombineFileInputFormat to fail
> ----------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 2.0.4-alpha, 2.2.0
> Reporter: Sangjin Lee
> Assignee: Robert Parker
> Priority: Critical
> Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch
>
>
> CombineFileInputFormat can easily create splits that can come from many
> different locations (during the last pass of creating "global" splits).
> However, we observe that this often runs afoul of the
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any
> decent size cluster, CombineFileInputFormat creates splits that are well
> above this limit.
--
This message was sent by Atlassian JIRA
(v6.1#6144)