Hyunsik Choi created TAJO-1283:
----------------------------------

             Summary: ORDER BY with the first descending order causes wrong 
results
                 Key: TAJO-1283
                 URL: https://issues.apache.org/jira/browse/TAJO-1283
             Project: Tajo
          Issue Type: Improvement
          Components: distributed query plan, planner/optimizer
            Reporter: Hyunsik Choi
            Priority: Critical
             Fix For: 0.10


Each order key by can be specified with ascending or descending order. 
Recently, I found that ORDER BY with the first descending order key causes 
wrong result.

If second key is a descending order, it works well. Other cases work correctly.
{code}
select l_orderkey, l_partkey from lineitem order by l_orderkey, l_partkey desc;

l_orderkey,  l_partkey
-------------------------------
1,  155190
1,  67310
1,  63700
1,  24027
1,  15635
1,  2132
2,  106170
3,  183095
3,  128449
3,  62143
3,  29380
3,  19036
3,  4297
...
{code}

But, if the first sort key is a descending order, it causes wrong row number 
and shows wrong range part as follows:

{code}
default> select l_orderkey, l_partkey from lineitem order by l_orderkey desc, 
l_partkey;
l_orderkey,  l_partkey
-------------------------------
3000000,  61045
3000000,  159113
3000000,  167695
3000000,  167904
3000000,  196339
...
{code}

According to my investigation, it seems to be related to offset problem of 
RowFile or index problem. The final result includes duplicated rows and the 
final row was wrong as follows:


{code:title=part-02-000000-000}
3000000|61045
3000000|159113
3000000|167695
3000000|167904
3000000|196339
2999975|28334
2999975|194023
2999974|8020
2999974|124152
2999974|129921
2999974|139248
2999974|168914
2999974|187923
2999973|30533
2999973|36196
...
2919713|133486
2919713|195963
2919712|86257
2919712|94542
2919712|107370
2919712|166342 <- duplicated rows
2919712|178277
....
1|63700
1|67310
1|155190
[EOF]
{code}

{code:title=part-02-000001-000}
|96127                     <- looks wrong
6000000|32255
6000000|96127
5999975|6452
5999975|7272
5999975|37131
....
....
2919713|133486
2919713|195963
2919712|94542
2919712|107370
2919712|166342    <- duplicated rows
[EOF]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to