[ 
https://issues.apache.org/jira/browse/PIG-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5038:
------------------------------------
    Assignee: Konstantin Harasov
     Summary: Pig Limit_2 e2e test failed with sort check  (was: Pig e2e test 
failed with Sort check failed (TEST: Limit_2))

+1. Committed to trunk.

Thanks for finding a solution/workaround for this. This was something in my 
todo list to look into for a long time. As per definition of the sort option 
-k1,3 should work fine and is what we should be doing as order by is done on 
three columns. The test passes fine in Mac with -k1,3 with sort command working 
as expected. Not sure why the Linux implementation was doing a wrong sort. For 
-k1,3 it actually gives result of -k1,1.



> Pig Limit_2 e2e test failed with sort check
> -------------------------------------------
>
>                 Key: PIG-5038
>                 URL: https://issues.apache.org/jira/browse/PIG-5038
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Konstantin Harasov
>            Assignee: Konstantin Harasov
>             Fix For: 0.17.0
>
>         Attachments: PIG-5038.patch
>
>
> {noformat}
> error: Going to run sort check command: sort -cs -t     -k 1,3 
> ./out/pigtest/../..-1475241304-nightly.conf/Limit_2.out/out_original
> /bin/sort: 
> ./out/pigtest/../..-1475241304-nightly.conf/Limit_2.out/out_original:27: 
> disorder:       18
> Sort check failed
> INFO: TestDriver::runTestGroup() at 706:Test Limit_2 FAILED at 1475241624
> Ending test Limit_2 at 1475241624
> {noformat}
> The test failed because of difference in sorting in Pig {{(ORDER BY 
> $0,$1,$2)}} and {{sort -t  $'\t'-k 1,3}} in bash.
> The problem is that empty fields are sorted/processed differently 
> in Pig using {{ORDER BY}} and bash using {{sort}}.
> See example for file studentnulltab10k.
> *Pig*:
> {code:linenumbers=true}
>               
>               
>               
>               0.12
>               1.04
>               1.15
>               1.25
>               1.27
>               1.31
>               1.59
>               1.61
>               1.62
>               1.76
>               1.95
>               2.09
>               2.35
>               2.66
>               3.04
>               3.23
>               3.31
>               3.39
>               3.46
>               3.54
>               3.65
>               3.75
>               3.97
>       18      
>       18      0.41
> {code}
> *bash: sort -t  $'\t'-k 1,3*
> {code:linenumbers=true}
>               
>               
>               
>               0.12
>               1.04
>               1.15
>               1.25
>               1.27
>               1.31
>               1.59
>               1.61
>               1.62
>               1.76
>       18      
>       18      0.41
>       18      0.54
>       18      1.78
>       18      2.46
>       18      2.54
>       19      0.07
>       19      0.27
>       19      0.39
>       19      2.27
>       19      2.50
>       19      2.60
>       19      2.89
>       19      3.87
>               1.95
> {code}
> *bash: sort -t  $'\t'-k 1,2*
> {code:linenumbers=true}
>               
>               
>               
>               0.12
>               1.04
>               1.15
>               1.25
>               1.27
>               1.31
>               1.59
>               1.61
>               1.62
>               1.76
>               1.95
>               2.09
>               2.35
>               2.66
>               3.04
>               3.23
>               3.31
>               3.39
>               3.46
>               3.54
>               3.65
>               3.75
>               3.97
>       18      
>       18      0.41
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to