[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975212#comment-13975212
 ] 

David Rosenstrauch commented on MAPREDUCE-5402:
-----------------------------------------------

Looks like the patch is working.  Tested it on a few heavy load jobs today, and 
it definitely seemed to work around the issue I was having with the "Too many 
chunks created" error.  I probably need to tune the parms a bit to optimize the 
distcp runs I'm doing, but the basic functionality does seem to work.  (Note:  
I did my testing using your patch with the backported version of distcp at 
https://github.com/QwertyManiac/hadoop-distcp-mr1, as described in 
https://issues.cloudera.org/browse/DISTRO-420 , and not the current Hadoop 
trunk version of the code.)

I'm still seeing a bit of long tail behavior, but it's more like 100 mappers 
that are taking longer to complete, rather than 1 or 2, which indicates that 
the copy is being distributed more optimally.

Here's some stats recorded from a few of these jobs:

Job 1:
Total number of files:  20,027
Number of files copied: 20,017
Number of files skipped:        10
Number of bytes copied: 84,802,510,328
Number of mappers:      512
Split ratio:    10
Max chunks tolerable:   10,000
Number of dynamic-chunk-files created:  5012
Run time:       5mins, 19sec

Job 2:
Total number of files:  36,374
Number of files copied: 17,160
Number of files skipped:        19,214
Number of bytes copied: 1,196,591,437,407
Number of mappers:      512
Split ratio:    10
Max chunks tolerable:   10,000
Number of dynamic-chunk-files created:  4714
Run time:       50mins, 50sec

Job 2 can obviously be optimized a bit better.  (I.e., it'll distribute much 
better if I eliminate all those files being skipped.)  But this is still an 
improvement - it used to take over 2 hours to run that job.

Thanks again for working on the fix for this issue.  Any additional questions 
you have please let me know.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5402
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp, mrv2
>            Reporter: David Rosenstrauch
>            Assignee: Tsuyoshi OZAWA
>         Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to