[
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480378#comment-13480378
]
Jonathan Coveney commented on PIG-2975:
---------------------------------------
As a side note, Koji, if you make a new jira specifically about improve
BinInterSedesRawComparator's handling of DataByteArray's I will review and
commit it. And if you want to learn Pig, you could make another JIRA about
improving the performance in general. IMHO BinInterSedes (and that whole code
path that touches it) could probably be significantly improved.
W.r.t. to this issue, I think we should either directly compare the bytes
(currently leaning towards this), or we can just have a special lightweight
comparator that special cases DataByteArrays, and delegates to
BinInterSedesRawComparator otherwise. We wouldn't need the complexity of the
union approach, and we should get the correctness, speed, and stable bytearray
sort order.
That said, IF we decide to preserve byte array sort order, I think we should
make a decision now about whether or not we want to define that semantic. If
not, then just directly comparing the bytes should be a-ok, since all that is
important for bytearrays currently is that a global ordering exists, not what
that global ordering is.
> TestTypedMap.testOrderBy failing with incorrect result
> -------------------------------------------------------
>
> Key: PIG-2975
> URL: https://issues.apache.org/jira/browse/PIG-2975
> Project: Pig
> Issue Type: Sub-task
> Affects Versions: 0.11
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Blocker
> Fix For: 0.11
>
> Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch,
> pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt,
> pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at
> {noformat}
> junit.framework.AssertionFailedError
> at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main -x local test/orderby.pig
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira