[
https://issues.apache.org/jira/browse/CRUNCH-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957697#comment-13957697
]
Chao Shi commented on CRUNCH-368:
---------------------------------
Yes, agree with you that this could be rarely happened. So I think it is
reasonable to compare on type code first. With this, we can simply skip calling
the real comparator, which may likely fail though.
Another for this is about the implementation of the new comparator. In
compareField(), it tries to get the comparator of the inner writable type,
which is registered per-type. If comparison on different writable type is
allowed, we would have to fallback to the old comparator.
> TupleWritable.Comparator
> ------------------------
>
> Key: CRUNCH-368
> URL: https://issues.apache.org/jira/browse/CRUNCH-368
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.10.0, 0.8.3
> Reporter: Chao Shi
> Assignee: Chao Shi
> Attachments: crunch-368 benchmark.pdf, crunch-368.patch, gen_data.py
>
>
> This patch should improve comparison performance on TupleWritables. It saves
> the deserialization overhead. It is particularly useful when the input tuple
> are large, e.g. contains long strings.
> Please note that this changes the binary format of TupleWritable. It adds a
> var-int indicating size of field after each type code. This is a limitation
> of the writable system. We do not know the size of each field until fully
> desalinizing it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)