[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480455#comment-13480455 ]
Jonathan Coveney commented on PIG-2975: --------------------------------------- This is one benefit (and in some senses, the drawback) of using BinInterSedesRawComparator. Because of how Tuples are serialized, it in fact is using the "proper" raw comparator (and thus, providing the proper sort order) even though the user did not specify a Schema. I found Gianmarco's argument towards trying to make BinInterSedesRawComparator fairly persuasive, though that code has a different goal. I guess this comes down to how nice we want to be to people given that they do not specify a Schema. We can take a performance hit and try and figure things out for them, or we can make it blazing fast but with arbitrary guarantees. Given that the way to free yourself from those arbitrary guarantees is "add a schema," you would then lose the speed benefits anyway. This, to me, is an argument for using BinInterSedesTupleRawComparator, in the sense that if this is the "preferred" path, we should use it and, as Gianmarco said, spend time optimizing it (since it is a pretty important code path for a lot more code than just this case). UNLESS we want to promote using DataByteArray's explicitly because we can do a much faster sort (I do not think this is what we should advocate, though if something is legitimately a DataByteArray there is no reason not to try and optimize that path so it's very fast...it should be, eh?). Thoughts? Thanks for hashing this out, guys. > TestTypedMap.testOrderBy failing with incorrect result > ------------------------------------------------------- > > Key: PIG-2975 > URL: https://issues.apache.org/jira/browse/PIG-2975 > Project: Pig > Issue Type: Sub-task > Affects Versions: 0.11 > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Blocker > Fix For: 0.11 > > Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, > pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, > pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt > > > Looked at > {noformat} > junit.framework.AssertionFailedError > at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) > {noformat} > This looks like a valid test case failing with incorrect result. > {noformat} > % cat test/orderby.txt > [key#1,key9#23] > [key#3,key3#2] > [key#22] > % cat test/orderby.pig > a = load 'test/orderby.txt' as (m:[]); > b = foreach a generate m#'key' as b0; > dump b; > c = order b by b0; > dump c; > % java ... org.apache.pig.Main -x local test/orderby.pig > [dump b] > (1) > (3) > (22) > ... > [dump c] > (1) > (1) > (22) > % > where did the '(3)' go? > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira