[
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804460#comment-13804460
]
Sangjin Lee commented on MAPREDUCE-5186:
----------------------------------------
Thanks for the patch [~robsparker]!
Quick question: I noticed there are these calls in the patch:
{code}
locations = Arrays.copyOf(locations, maxBlockLocations);
{code}
What is the reason for cutting out the original locations? Wouldn't it be
limiting the number of block locations to the max block locations? Then some
splits that truly need to collect blocks from more locations would not be
created?
> mapreduce.job.max.split.locations causes some splits created by
> CombineFileInputFormat to fail
> ----------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 2.0.4-alpha, 2.2.0
> Reporter: Sangjin Lee
> Assignee: Robert Parker
> Priority: Critical
> Attachments: MAPREDUCE-5186v1.patch
>
>
> CombineFileInputFormat can easily create splits that can come from many
> different locations (during the last pass of creating "global" splits).
> However, we observe that this often runs afoul of the
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any
> decent size cluster, CombineFileInputFormat creates splits that are well
> above this limit.
--
This message was sent by Atlassian JIRA
(v6.1#6144)