[ 
https://issues.apache.org/jira/browse/TEZ-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621075#comment-14621075
 ] 

Gopal V edited comment on TEZ-2606 at 7/9/15 6:58 PM:
------------------------------------------------------

bq. And even that doesn't really work unless you use TEZ-1288, since for most 
Writables the first few bytes are the size of the structure, not part of the 
key.

That is easily by-passed if the target application takes up an explicit 
dependency on Tez.

But consider the BytesWritable for instance, which does this in its comparator.

{code}
    @Override
    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      return compareBytes(b1, s1+LENGTH_BYTES, l1-LENGTH_BYTES, 
                          b2, s2+LENGTH_BYTES, l2-LENGTH_BYTES);
    }
{code}

Its {{getProxy()}} can also skip {{LENGTH_BYTES}} before producing a proxy for 
comparison (which is why it got renamed to a ProxyComparator from the original 
PrefixComparator name)


was (Author: gopalv):
bq. And even that doesn't really work unless you use TEZ-1288, since for most 
Writables the first few bytes are the size of the structure, not part of the 
key.

That is easily by-passed if the target application takes up an explicit 
dependency on Tez.

But consider the BytesWritable for instance, which does this in its comparator.

{code}
    @Override
    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      return compareBytes(b1, s1+LENGTH_BYTES, l1-LENGTH_BYTES, 
                          b2, s2+LENGTH_BYTES, l2-LENGTH_BYTES);
    }
{code}

Its {getProxy()} can also skip {{LENGTH_BYTES}} before producing a proxy for 
comparison (which is why it got renamed to a ProxyComparator from the original 
PrefixComparator name)

> Cache-friendly data structure for sorting
> -----------------------------------------
>
>                 Key: TEZ-2606
>                 URL: https://issues.apache.org/jira/browse/TEZ-2606
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Tsuyoshi Ozawa
>            Assignee: Tsuyoshi Ozawa
>
> Alphasort[1]  mentions prefix key sort is effective way. I'd like to suggest 
> to change a layout of ring buffer to include prefix of key in meta data. This 
> can improve the cache hit rate when sorting.
> [1] Alphasort: http://dl.acm.org/citation.cfm?id=615237



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to