[ 
https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545627
 ] 

owen.omalley edited comment on HADOOP-2284 at 11/26/07 2:21 PM:
-----------------------------------------------------------------

Another important note on this is that the ratio of "overhead" in the compare 
looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu 
seconds and the work is being done in 
org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is 
only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction 
levels wrapped around the compare. Part of that overhead is the progress, but I 
suspect that we should work on striping out more of the overhead.

      was (Author: owen.omalley):
    Another important note on this is that the ratio of "overhead" in the 
compare looks really bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu 
seconds and the work is being done in 
org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int) is 
only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction 
levels wrapped around the compare. Part of that overhead is the progress, but I 
suspect that should strip out more of the overhead.
  
> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it 
> would make more sense to call progress in the sort rather than the compare or 
> at most every 10000 compares. In the performance numbers, the call to 
> progress as part of the sort are consuming 12% of the total cpu time when 
> running word count under the local runner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to