[ https://issues.apache.org/jira/browse/PIG-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981795#comment-15981795 ]
Adam Szita commented on PIG-5164: --------------------------------- This is due to {{SecondaryKeySortUtil.AccumulateByKey}} not handling null keys properly: It will keep accumulating tuples where the key was null, and it will merge the accumulated tuple list with the first non-null key tuple (in the example above in the description with "alice falkner"). That's how it's producing wrong result in the non-null keyed tuple, and misses the null keyed tuples out from the result. In my patch [^PIG-5164.0.patch] I've fixed this by introducing an {{initialized}} flag so that we don't confuse the two reasons of the key being null. [~kellyzly] can you please review? > MultiQuery_Union_3 is failing with spark exec type > -------------------------------------------------- > > Key: PIG-5164 > URL: https://issues.apache.org/jira/browse/PIG-5164 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: Adam Szita > Fix For: spark-branch > > Attachments: PIG-5164.0.patch > > > Outputs are different: > first output > {code} > diff MultiQuery_Union_3.out/1/out_sorted > MultiQuery_Union_3_benchmark.out/1/out_sorted > 0a1 > > {(,,,,)} 12110.0 > 6c7 > < {(alice falkner,,,,)} 19007.0 > --- > > {(alice falkner,24,0.81,,911.81)} 6897.0 > {code} > the second output is entirely different: > {code} > diff <(head -n 5 MultiQuery_Union_3.out/2/out_sorted) <(head -n 5 > MultiQuery_Union_3_benchmark.out/2/out_sorted) > 1,5c1,5 > < {(alice allen,69,1.95,socialist,499.63)} 2422.0 > < {(alice brown,76,1.52,socialist,791.95)} 10575.0 > < {(alice carson,66,1.01,socialist,421.71)} 2445.0 > < {(alice davidson,72,0.25,socialist,347.66)} 5104.0 > < {(alice ellison,67,1.96,socialist,557.02)} 2737.0 > --- > > {(,,3.97,,)} 12110.0 > > {(alice allen,46,1.71,green,766.16)} 2422.0 > > {(alice brown,23,0.79,,917.19)} 10575.0 > > {(alice carson,24,1.49,democrat,607.27)} 2445.0 > > {(alice davidson,,0.66,,491.80)} 5104.0 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)