[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

Sangjin Lee (JIRA) Fri, 25 Oct 2013 09:08:25 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805408#comment-13805408
 ]


Sangjin Lee commented on MAPREDUCE-5186:
----------------------------------------

I am also comfortable with not logging, but I wasn't sure whether the concern 
that led to the max split locations is no longer valid (at least in hadoop 2).

I am more concerned about truncating the split locations. I'm not 100% familiar 
with this part of the code, but by truncating aren't we dropping some of these 
locations on the floor? Then when the splits are run on the cluster, would it 
always lead to working splits and mappers? It wasn't clear to me whether it 
would always work even if we dropped these on the floor when we write the 
splits. I would appreciate if you could shed some insight on this... And yes, 
if this is a valid concern, it sounds like it would be a valid concern in MR1 
too.

> mapreduce.job.max.split.locations causes some splits created by 
> CombineFileInputFormat to fail
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: job submission
>    Affects Versions: 2.0.4-alpha, 2.2.0
>            Reporter: Sangjin Lee
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch
>
>
> CombineFileInputFormat can easily create splits that can come from many 
> different locations (during the last pass of creating "global" splits). 
> However, we observe that this often runs afoul of the 
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any 
> decent size cluster, CombineFileInputFormat creates splits that are well 
> above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

Reply via email to