[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122316#comment-14122316
 ] 

Binglin Chang commented on MAPREDUCE-6067:
------------------------------------------

Thanks for the comments Manu and Sean. 
bq. Since the new added line lies in the critical path of performance. May be 
it is risky to change here?
1. The added code just increase 2 counters, the performance impact should be 
negligible, and we need a way to get the counter number right? java side also 
increase counters for every kv pair.
bq. For these two, they are not inited in the constructor. 
2. I will move all the added counters to MapOutputCollector, and added them to 
constructor
bq. Log is removed due to it is too noisy? The log was added after real pain 
and practices in troubleshootings
3. In common practice, log added when troubleshooting bug should be remove 
after the bug is found and fixed. Too bad we don't have debug log level. I 
removed 2 logs(spill file path and buffer is full), spill file path is useful 
only for debugging only, and buffer is full is very common situation, and is 
always implied by later mid-spill log, don't see why it is useful anymore when 
the bug is already gone. 
bq. Can we make normalJob and nativeJob local var instead of field member? 
Since it is a test file, Test case should share nothing except  immutable 
things defined in test setup.
4. Simple use local var doesn't work, if we want to eliminate field member, we 
need a way to get both outputpath and job from sub-methods(runNativeTest, 
runNormalTest), perhaps just inline them into test method, this is lot change 
compare to current approach, if you think it's OK, I will make more aggressive 
changes. 
bq. By deleting the cleanup code, have you confirmed that it will leak any 
garbage file on local disk?
5 . I see you already add cleanUp method to remove root dir, so the  old 
cleanup code is removed
bq. It is confusing by looking at this line of change.
6. It's confusing, again see comment 4
bq. Maybe we can add some message in assert
7. When assertion fails, the failed line(number?) is showed, which already have 
Counter name info, I though that should be enough. OK, will add some message
bq. I cannot find you use this counter anywhere.
Here, it's in c++ code
{code}
@@ -157,6 +158,8 @@ void MapOutputCollector::configure(Config * config) {

   Counter * spilledRecord = 
NativeObjectFactory::GetCounter(TaskCounters::TASK_COUNTER_GROUP,
       TaskCounters::SPILLED_RECORDS);
+  Counter * materializedBytes = 
NativeObjectFactory::GetCounter(TaskCounters::TASK_COUNTER_GROUP,
+      TaskCounters::MAP_OUTPUT_MATERIALIZED_BYTES);

   ICombineRunner * combiner = NULL;
   if (NULL != config->get(NATIVE_COMBINER)
{code}


> native-task: spilled records counter is incorrect
> -------------------------------------------------
>
>                 Key: MAPREDUCE-6067
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6067
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>            Reporter: Todd Lipcon
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-6067.v1.patch, MAPREDUCE-6067.v2.patch, 
> MAPREDUCE-6067.v3.patch, MAPREDUCE-6067.v4.patch, native-counters.html, 
> trunk-counters.html
>
>
> After running a terasort, I see the spilled records counter at 5028651606, 
> which is about half what I expected to see. Using the non-native collector I 
> see the expected count of 10000000000. It seems the correct number of records 
> were indeed spilled, because the job's output record count is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to