[
https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545627
]
owen.omalley edited comment on HADOOP-2284 at 11/26/07 2:21 PM:
-----------------------------------------------------------------
Another important note on this is that the ratio of "overhead" in the compare
looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu
seconds and the work is being done in
org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is
only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction
levels wrapped around the compare. Part of that overhead is the progress, but I
suspect that we should work on striping out more of the overhead.
was (Author: owen.omalley):
Another important note on this is that the ratio of "overhead" in the
compare looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu
seconds and the work is being done in
org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is
only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction
levels wrapped around the compare. Part of that overhead is the progress, but I
suspect that should strip out more of the overhead.
> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
> Key: HADOOP-2284
> URL: https://issues.apache.org/jira/browse/HADOOP-2284
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Devaraj Das
> Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it
> would make more sense to call progress in the sort rather than the compare or
> at most every 10000 compares. In the performance numbers, the call to
> progress as part of the sort are consuming 12% of the total cpu time when
> running word count under the local runner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.