[
https://issues.apache.org/jira/browse/HADOOP-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580487#action_12580487
]
Doug Cutting commented on HADOOP-3046:
--------------------------------------
> BytesWritable doesn't use vints, so the offsets are fixed
Good point. So the fix is as simple as:
{noformat}
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
- int size1 = readInt(b1, s1);
- int size2 = readInt(b2, s2);
- return compareBytes(b1, s1+4, size1, b2, s2+4, size2);
+ return compareBytes(b1, s1+4, l1-4, b2, s2+4, l2-4);
}
}
{noformat}
> Text does, but it already uses WritableUtils::getVIntSize which should be all
> that's required.
I think this case is a bit more complicated, as I mentioned above. To
calculate the length without parsing it from the buffer requires some VInt
logic that's not in getVIntSize. We're passed x and we we need to compute y,
where x = getVIntSize(y) + y.
> Text and BytesWritable's raw comparators should use the lengths provided
> instead of rebuilding them from scratch using readInt
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-3046
> URL: https://issues.apache.org/jira/browse/HADOOP-3046
> Project: Hadoop Core
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.17.0
>
>
> It would be much faster to use the key length provided by the raw compare
> function rather than rebuilding the integer lengths back up from bytes twice
> for every comparison in the sort.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.