[ 
https://issues.apache.org/jira/browse/TAJO-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294789#comment-14294789
 ] 

ASF GitHub Bot commented on TAJO-1283:
--------------------------------------

GitHub user sirpkt opened a pull request:

    https://github.com/apache/tajo/pull/364

    TAJO-1283: ORDER BY with the first descending order causes wrong results

     As BaseTupleComparator already handles both ascending and descending sort 
keys, remove unnecessary DescendingTupleRangeComparator class.    
    - We do not need to consider whether the first sort key is ascending or 
descending in scheduleRangeShuffledFetches() so related codes are removed.
    - Add TEST_MIN_TASK_NUM support in getNonLeafTaskNum() of Stage.java for 
Test purpose.
    
    'mvn clean install' passed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sirpkt/tajo TAJO-1283

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #364
    
----
commit 30a16e3f166ca34c9391a4dbad64bd2a92b58737
Author: Keuntae Park <[email protected]>
Date:   2015-01-28T06:37:07Z

    As BaseTupleComparator already handles both ascending and descending sort 
keys, remove unnecessary DescendingTupleRangeComparator class.
    
    - We do not need to consider whether the first sort key is ascending or 
descending in scheduleRangeShuffledFetches().
    - Add TEST_MIN_TASK_NUM support in getNonLeafTaskNum() of Stage.java for 
Test purpose.

commit 61619daa98b3d7e89f902e6f587047af54a0a2de
Author: Keuntae Park <[email protected]>
Date:   2015-01-28T06:41:39Z

    Merge branch 'master' into TAJO-1283

----


> ORDER BY with the first descending order causes wrong results
> -------------------------------------------------------------
>
>                 Key: TAJO-1283
>                 URL: https://issues.apache.org/jira/browse/TAJO-1283
>             Project: Tajo
>          Issue Type: Bug
>          Components: distributed query plan, planner/optimizer
>            Reporter: Hyunsik Choi
>            Assignee: Jinho Kim
>            Priority: Critical
>             Fix For: 0.10
>
>
> Each order key by can be specified with ascending or descending order. 
> Recently, I found that ORDER BY with the first descending order key causes 
> wrong result.
> If second key is a descending order, it works well. Other cases work 
> correctly.
> {code}
> select l_orderkey, l_partkey from lineitem order by l_orderkey, l_partkey 
> desc;
> l_orderkey,  l_partkey
> -------------------------------
> 1,  155190
> 1,  67310
> 1,  63700
> 1,  24027
> 1,  15635
> 1,  2132
> 2,  106170
> 3,  183095
> 3,  128449
> 3,  62143
> 3,  29380
> 3,  19036
> 3,  4297
> ...
> {code}
> But, if the first sort key is a descending order, it causes wrong row number 
> and shows wrong range part as follows:
> {code}
> default> select l_orderkey, l_partkey from lineitem order by l_orderkey desc, 
> l_partkey;
> l_orderkey,  l_partkey
> -------------------------------
> 3000000,  61045
> 3000000,  159113
> 3000000,  167695
> 3000000,  167904
> 3000000,  196339
> ...
> {code}
> According to my investigation, it seems to be related to offset problem of 
> RowFile or index problem. The final result includes duplicated rows and the 
> final row was wrong as follows:
> {code:title=part-02-000000-000}
> 3000000|61045
> 3000000|159113
> 3000000|167695
> 3000000|167904
> 3000000|196339
> 2999975|28334
> 2999975|194023
> 2999974|8020
> 2999974|124152
> 2999974|129921
> 2999974|139248
> 2999974|168914
> 2999974|187923
> 2999973|30533
> 2999973|36196
> ...
> 2919713|133486
> 2919713|195963
> 2919712|86257
> 2919712|94542
> 2919712|107370
> 2919712|166342 <- duplicated rows
> 2919712|178277
> ....
> 1|63700
> 1|67310
> 1|155190
> [EOF]
> {code}
> {code:title=part-02-000001-000}
> |96127                     <- looks wrong
> 6000000|32255
> 6000000|96127
> 5999975|6452
> 5999975|7272
> 5999975|37131
> ....
> ....
> 2919713|133486
> 2919713|195963
> 2919712|94542
> 2919712|107370
> 2919712|166342    <- duplicated rows
> [EOF]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to