[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649241#comment-16649241
 ] 

Xiao Chen commented on MAPREDUCE-7132:
--------------------------------------

Thanks [~pbacsko] for filing the Jira and providing a fix!

The patch makes sense to me. Some comments:
 - The ec policy on {{jobSubmitDir}} may not be the same for every file inside 
it. There is no way to know for sure, except to getECPolicy on every file, 
which seems like a non-trivial overhead. But if there is no easier way, I'm +1 
on the current approach as a best effort, and improve the handling of it in a 
separate jira.
 - Having each policy's minimum is nice. IMO in the case when someone sets 
{{mapreduce.job.max.split.locations}} to a lower number but 
{{getMaxBlockLocationsIfECused}} bumps it higher, it'd be good to have some 
info logs (or warn, I'm not sure how important this is for MR behaviors) 
indicating so.
 - Is it feasible to bump the default for {{mapreduce.job.max.split.locations}} 
from 10 to 15, to accommodate the maximum number of blocks for the largest 
system EC policy (RS10,4)? This would also make sure the message mentioned in 
my last point doesn't log by default....

[~haibochen]'s review would be nice since this is MR. :)

> Check erasure coding in JobSplitWriter to avoid warnings
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-7132
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7132
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client, mrv2
>    Affects Versions: 3.1.1
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: MAPREDUCE-7132-001.patch, MAPREDUCE-7132-002.patch, 
> MAPREDUCE-7132-003.patch, MAPREDUCE-7132-004.patch, MAPREDUCE-7132-005.patch
>
>
> Currently, {{JobSplitWriter}} compares the number of hosts for a certain 
> block against a static value that comes from 
> {{mapreduce.job.max.split.locations}}. The default value of this property is 
> 10.
> However, an EC shema like RS-10-4 requires at least 14 hosts. In this case, 
> 14 block locations will be returned and {{JobSplitWriter}} prints a warning, 
> which can confuse users.
> A possible solution could check whether EC is enabled for a block and 
> increase this value dynamically if needed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to