[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158540#comment-14158540
 ] 

Jason Lowe commented on MAPREDUCE-5873:
---------------------------------------

[~jrottinghuis], agree the current method of computing bandwidth is broken.

Thanks for the patch, [~l201514].  The patch has gone stale, can you please 
rebase on trunk?  Other comments:

Why is the following change necessary?  I don't see how mapOutput can be null, 
but maybe I'm missing something:
{code}
     if (!finishedMaps[mapIndex]) {
-      output.commit();
+      if (output != null) {
+        output.commit();
+      }
{code}

We should provide a sizing hint to the ArrayList constructor during 
computeTotalCopyMills since we know worst-case how big it should be 
(intervals.size()+1) and we can avoid reallocation/copy during add.

In the test, the comment doesn't match the code in a few places:
{code}
      //100MB from 50s to 100s
      bytes = (long)100 * 1024 * 1024;
      scheduler.copySucceeded(attemptID3, new MapHost(null, null), bytes, 
200000, 300000, null);
...
      //100MB from 50s to 100s
      bytes = (long)100 * 1024 * 1024;
      scheduler.copySucceeded(attemptID4, new MapHost(null, null), bytes, 
100000, 150000, null);
{code}

Would like to see the unit test exercise all conditions of the interval adding 
code, such as adding a large overlapping interval (e.g.: 0-500s) late.

Nit: Not sure the intervals.size() == 0 check is worth the extra code, 
especially if we just initialize intervals to Collections.emptyList().

Nit: it would be nice to be consistent about Mills vs Millis.  I prefer Millis 
myself, as it's used elsewhere in the code, or maybe CopyMillsTracker should 
just be CopyTimeTracker with a getCopyMillis method.

Nit: The name TestMultipleTestSucceeded didn't convey to me what it's actually 
testing.  Maybe TestAggregatedTransferRate would be more descriptive?

> Measure bw of a single copy call and display the correct aggregated bw
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5873
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Siqi Li
>            Assignee: Siqi Li
>         Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch, 
> MAPREDUCE-5873.v3.patch
>
>
> Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its 
> definition however is confusing because it captures the time where there is 
> no copying because there is a pause between when new wave of map outputs is 
> available.
> current bw is definded as (bytes copied so far) / (total time in the copy 
> phase so far)
> It would be more useful 
> 1) to measure bandwidth of a single copy call.
> 2) display aggregated bw as long as there is at least one fetcher is in the 
> copy call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to