[
https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158540#comment-14158540
]
Jason Lowe commented on MAPREDUCE-5873:
---------------------------------------
[~jrottinghuis], agree the current method of computing bandwidth is broken.
Thanks for the patch, [~l201514]. The patch has gone stale, can you please
rebase on trunk? Other comments:
Why is the following change necessary? I don't see how mapOutput can be null,
but maybe I'm missing something:
{code}
if (!finishedMaps[mapIndex]) {
- output.commit();
+ if (output != null) {
+ output.commit();
+ }
{code}
We should provide a sizing hint to the ArrayList constructor during
computeTotalCopyMills since we know worst-case how big it should be
(intervals.size()+1) and we can avoid reallocation/copy during add.
In the test, the comment doesn't match the code in a few places:
{code}
//100MB from 50s to 100s
bytes = (long)100 * 1024 * 1024;
scheduler.copySucceeded(attemptID3, new MapHost(null, null), bytes,
200000, 300000, null);
...
//100MB from 50s to 100s
bytes = (long)100 * 1024 * 1024;
scheduler.copySucceeded(attemptID4, new MapHost(null, null), bytes,
100000, 150000, null);
{code}
Would like to see the unit test exercise all conditions of the interval adding
code, such as adding a large overlapping interval (e.g.: 0-500s) late.
Nit: Not sure the intervals.size() == 0 check is worth the extra code,
especially if we just initialize intervals to Collections.emptyList().
Nit: it would be nice to be consistent about Mills vs Millis. I prefer Millis
myself, as it's used elsewhere in the code, or maybe CopyMillsTracker should
just be CopyTimeTracker with a getCopyMillis method.
Nit: The name TestMultipleTestSucceeded didn't convey to me what it's actually
testing. Maybe TestAggregatedTransferRate would be more descriptive?
> Measure bw of a single copy call and display the correct aggregated bw
> ----------------------------------------------------------------------
>
> Key: MAPREDUCE-5873
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Siqi Li
> Assignee: Siqi Li
> Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch,
> MAPREDUCE-5873.v3.patch
>
>
> Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its
> definition however is confusing because it captures the time where there is
> no copying because there is a pause between when new wave of map outputs is
> available.
> current bw is definded as (bytes copied so far) / (total time in the copy
> phase so far)
> It would be more useful
> 1) to measure bandwidth of a single copy call.
> 2) display aggregated bw as long as there is at least one fetcher is in the
> copy call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)