[ 
https://issues.apache.org/jira/browse/PIG-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981795#comment-15981795
 ] 

Adam Szita commented on PIG-5164:
---------------------------------

This is due to {{SecondaryKeySortUtil.AccumulateByKey}} not handling null keys 
properly: It will keep accumulating tuples where the key was null, and it will 
merge the accumulated tuple list with the first non-null key tuple (in the 
example above in the description with "alice falkner"). 

That's how it's producing wrong result in the non-null keyed tuple, and misses 
the null keyed tuples out from the result.

In my patch [^PIG-5164.0.patch] I've fixed this by introducing an 
{{initialized}} flag so that we don't confuse the two reasons of the key being 
null.
[~kellyzly] can you please review?

> MultiQuery_Union_3 is failing with spark exec type
> --------------------------------------------------
>
>                 Key: PIG-5164
>                 URL: https://issues.apache.org/jira/browse/PIG-5164
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5164.0.patch
>
>
> Outputs are different:
> first output
> {code}
> diff MultiQuery_Union_3.out/1/out_sorted 
> MultiQuery_Union_3_benchmark.out/1/out_sorted
> 0a1
> > {(,,,,)}    12110.0
> 6c7
> < {(alice falkner,,,,)}       19007.0
> ---
> > {(alice falkner,24,0.81,,911.81)}   6897.0
> {code}
> the second output is entirely different:
> {code}
> diff <(head -n 5 MultiQuery_Union_3.out/2/out_sorted) <(head -n 5 
> MultiQuery_Union_3_benchmark.out/2/out_sorted)
> 1,5c1,5
> < {(alice allen,69,1.95,socialist,499.63)}    2422.0
> < {(alice brown,76,1.52,socialist,791.95)}    10575.0
> < {(alice carson,66,1.01,socialist,421.71)}   2445.0
> < {(alice davidson,72,0.25,socialist,347.66)} 5104.0
> < {(alice ellison,67,1.96,socialist,557.02)}  2737.0
> ---
> > {(,,3.97,,)}        12110.0
> > {(alice allen,46,1.71,green,766.16)}        2422.0
> > {(alice brown,23,0.79,,917.19)}     10575.0
> > {(alice carson,24,1.49,democrat,607.27)}    2445.0
> > {(alice davidson,,0.66,,491.80)}    5104.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to