[
https://issues.apache.org/jira/browse/MAPREDUCE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122316#comment-14122316
]
Binglin Chang commented on MAPREDUCE-6067:
------------------------------------------
Thanks for the comments Manu and Sean.
bq. Since the new added line lies in the critical path of performance. May be
it is risky to change here?
1. The added code just increase 2 counters, the performance impact should be
negligible, and we need a way to get the counter number right? java side also
increase counters for every kv pair.
bq. For these two, they are not inited in the constructor.
2. I will move all the added counters to MapOutputCollector, and added them to
constructor
bq. Log is removed due to it is too noisy? The log was added after real pain
and practices in troubleshootings
3. In common practice, log added when troubleshooting bug should be remove
after the bug is found and fixed. Too bad we don't have debug log level. I
removed 2 logs(spill file path and buffer is full), spill file path is useful
only for debugging only, and buffer is full is very common situation, and is
always implied by later mid-spill log, don't see why it is useful anymore when
the bug is already gone.
bq. Can we make normalJob and nativeJob local var instead of field member?
Since it is a test file, Test case should share nothing except immutable
things defined in test setup.
4. Simple use local var doesn't work, if we want to eliminate field member, we
need a way to get both outputpath and job from sub-methods(runNativeTest,
runNormalTest), perhaps just inline them into test method, this is lot change
compare to current approach, if you think it's OK, I will make more aggressive
changes.
bq. By deleting the cleanup code, have you confirmed that it will leak any
garbage file on local disk?
5 . I see you already add cleanUp method to remove root dir, so the old
cleanup code is removed
bq. It is confusing by looking at this line of change.
6. It's confusing, again see comment 4
bq. Maybe we can add some message in assert
7. When assertion fails, the failed line(number?) is showed, which already have
Counter name info, I though that should be enough. OK, will add some message
bq. I cannot find you use this counter anywhere.
Here, it's in c++ code
{code}
@@ -157,6 +158,8 @@ void MapOutputCollector::configure(Config * config) {
Counter * spilledRecord =
NativeObjectFactory::GetCounter(TaskCounters::TASK_COUNTER_GROUP,
TaskCounters::SPILLED_RECORDS);
+ Counter * materializedBytes =
NativeObjectFactory::GetCounter(TaskCounters::TASK_COUNTER_GROUP,
+ TaskCounters::MAP_OUTPUT_MATERIALIZED_BYTES);
ICombineRunner * combiner = NULL;
if (NULL != config->get(NATIVE_COMBINER)
{code}
> native-task: spilled records counter is incorrect
> -------------------------------------------------
>
> Key: MAPREDUCE-6067
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6067
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: task
> Reporter: Todd Lipcon
> Assignee: Binglin Chang
> Attachments: MAPREDUCE-6067.v1.patch, MAPREDUCE-6067.v2.patch,
> MAPREDUCE-6067.v3.patch, MAPREDUCE-6067.v4.patch, native-counters.html,
> trunk-counters.html
>
>
> After running a terasort, I see the spilled records counter at 5028651606,
> which is about half what I expected to see. Using the non-native collector I
> see the expected count of 10000000000. It seems the correct number of records
> were indeed spilled, because the job's output record count is correct.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)