[jira] [Commented] (CRUNCH-368) TupleWritable.Comparator

Chao Shi (JIRA) Wed, 02 Apr 2014 07:43:33 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957697#comment-13957697
 ]


Chao Shi commented on CRUNCH-368:
---------------------------------

Yes, agree with you that this could be rarely happened. So I think it is 
reasonable to compare on type code first. With this, we can simply skip calling 
the real comparator, which may likely fail though.

Another for this is about the implementation of the new comparator. In 
compareField(), it tries to get the comparator of the inner writable type, 
which is registered per-type. If comparison on different writable type is 
allowed, we would have to fallback to the old comparator.

> TupleWritable.Comparator
> ------------------------
>
>                 Key: CRUNCH-368
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-368
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>         Attachments: crunch-368 benchmark.pdf, crunch-368.patch, gen_data.py
>
>
> This patch should improve comparison performance on TupleWritables. It saves 
> the deserialization overhead. It is particularly useful when the input tuple 
> are large, e.g. contains long strings.
> Please note that this changes the binary format of TupleWritable. It adds a 
> var-int indicating size of field after each type code. This is a limitation 
> of the writable system. We do not know the size of each field until fully 
> desalinizing it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CRUNCH-368) TupleWritable.Comparator

Reply via email to