[ 
https://issues.apache.org/jira/browse/HADOOP-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580487#action_12580487
 ] 

Doug Cutting commented on HADOOP-3046:
--------------------------------------

> BytesWritable doesn't use vints, so the offsets are fixed

Good point.  So the fix is as simple as:

{noformat}
     public int compare(byte[] b1, int s1, int l1,
                        byte[] b2, int s2, int l2) {
-      int size1 = readInt(b1, s1);
-      int size2 = readInt(b2, s2);
-      return compareBytes(b1, s1+4, size1, b2, s2+4, size2);
+      return compareBytes(b1, s1+4, l1-4, b2, s2+4, l2-4);
     }
   }
{noformat}

> Text does, but it already uses WritableUtils::getVIntSize which should be all 
> that's required.

I think this case is a bit more complicated, as I mentioned above.  To 
calculate the length without parsing it from the buffer requires some VInt 
logic that's not in getVIntSize.  We're passed x and we we need to compute y, 
where x = getVIntSize(y) + y.

> Text and BytesWritable's raw comparators should use the lengths provided 
> instead of rebuilding them from scratch using readInt
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3046
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3046
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.17.0
>
>
> It would be much faster to use the key length provided by the raw compare 
> function rather than rebuilding the integer lengths back up from bytes twice 
> for every comparison in the sort.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to