[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480455#comment-13480455
 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

This is one benefit (and in some senses, the drawback) of using 
BinInterSedesRawComparator. Because of how Tuples are serialized, it in fact is 
using the "proper" raw comparator (and thus, providing the proper sort order) 
even though the user did not specify a Schema.

I found Gianmarco's argument towards trying to make BinInterSedesRawComparator 
fairly persuasive, though that code has a different goal.

I guess this comes down to how nice we want to be to people given that they do 
not specify a Schema. We can take a performance hit and try and figure things 
out for them, or we can make it blazing fast but with arbitrary guarantees.

Given that the way to free yourself from those arbitrary guarantees is "add a 
schema," you would then lose the speed benefits anyway. This, to me, is an 
argument for using BinInterSedesTupleRawComparator, in the sense that if this 
is the "preferred" path, we should use it and, as Gianmarco said, spend time 
optimizing it (since it is a pretty important code path for a lot more code 
than just this case). UNLESS we want to promote using DataByteArray's 
explicitly because we can do a much faster sort (I do not think this is what we 
should advocate, though if something is legitimately a DataByteArray there is 
no reason not to try and optimize that path so it's very fast...it should be, 
eh?).

Thoughts?

Thanks for hashing this out, guys.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
> pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
> pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to