[
https://issues.apache.org/jira/browse/TEZ-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621075#comment-14621075
]
Gopal V edited comment on TEZ-2606 at 7/9/15 6:58 PM:
------------------------------------------------------
bq. And even that doesn't really work unless you use TEZ-1288, since for most
Writables the first few bytes are the size of the structure, not part of the
key.
That is easily by-passed if the target application takes up an explicit
dependency on Tez.
But consider the BytesWritable for instance, which does this in its comparator.
{code}
@Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
return compareBytes(b1, s1+LENGTH_BYTES, l1-LENGTH_BYTES,
b2, s2+LENGTH_BYTES, l2-LENGTH_BYTES);
}
{code}
Its {{getProxy()}} can also skip {{LENGTH_BYTES}} before producing a proxy for
comparison (which is why it got renamed to a ProxyComparator from the original
PrefixComparator name)
was (Author: gopalv):
bq. And even that doesn't really work unless you use TEZ-1288, since for most
Writables the first few bytes are the size of the structure, not part of the
key.
That is easily by-passed if the target application takes up an explicit
dependency on Tez.
But consider the BytesWritable for instance, which does this in its comparator.
{code}
@Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
return compareBytes(b1, s1+LENGTH_BYTES, l1-LENGTH_BYTES,
b2, s2+LENGTH_BYTES, l2-LENGTH_BYTES);
}
{code}
Its {getProxy()} can also skip {{LENGTH_BYTES}} before producing a proxy for
comparison (which is why it got renamed to a ProxyComparator from the original
PrefixComparator name)
> Cache-friendly data structure for sorting
> -----------------------------------------
>
> Key: TEZ-2606
> URL: https://issues.apache.org/jira/browse/TEZ-2606
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Tsuyoshi Ozawa
> Assignee: Tsuyoshi Ozawa
>
> Alphasort[1] mentions prefix key sort is effective way. I'd like to suggest
> to change a layout of ring buffer to include prefix of key in meta data. This
> can improve the cache hit rate when sorting.
> [1] Alphasort: http://dl.acm.org/citation.cfm?id=615237
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)