[ 
https://issues.apache.org/jira/browse/PIG-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905272#comment-15905272
 ] 

Nandor Kollar commented on PIG-5167:
------------------------------------

[~knoguchi] I see, I think we might not even need Limit_12 Tez-only test case, 
but instead, we should add a sort before limit in Limit_5, something like this:
{code}
a = load ':INPATH:/singlefile/studenttab10k';
b = load ':INPATH:/singlefile/votertab10k';
a1 = foreach a generate $0, $1;
b1 = foreach b generate $0, $1;
c = union a1, b1;
d = order c by $0;
e = limit d 100;
store e into ':OUTPATH:';
{code}
What do you think? Maybe this is not the best option regarding performance 
though.

> Limit_4 is failing with spark exec type
> ---------------------------------------
>
>                 Key: PIG-5167
>                 URL: https://issues.apache.org/jira/browse/PIG-5167
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5167.patch
>
>
> results are different:
> {code}
> diff <(head -n 5 Limit_4.out/out_sorted) <(head -n 5 
> Limit_4_benchmark.out/out_sorted)
> 1,5c1,5
> <     50      3.00
> <     74      2.22
> < alice carson        66      2.42
> < alice quirinius     71      0.03
> < alice van buren     28      2.50
> ---
> > bob allen           0.28
> > bob allen   22      0.92
> > bob allen   25      2.54
> > bob allen   26      2.35
> > bob allen   27      2.17
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to