[
https://issues.apache.org/jira/browse/TAJO-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294830#comment-14294830
]
Keuntae Park commented on TAJO-1283:
------------------------------------
Sure, [~jhkim].
Even though BaseTupleComparator can handle all the order by case properly
including the first descending order case,
scheduleRangeShuffledFetches() additionally reverses the comparison result
through DescendingTupleRangeComparator in the first descending order case, and
I think it is unnecessary operation.
> ORDER BY with the first descending order causes wrong results
> -------------------------------------------------------------
>
> Key: TAJO-1283
> URL: https://issues.apache.org/jira/browse/TAJO-1283
> Project: Tajo
> Issue Type: Bug
> Components: distributed query plan, planner/optimizer
> Reporter: Hyunsik Choi
> Assignee: Keuntae Park
> Priority: Critical
> Fix For: 0.10
>
>
> Each order key by can be specified with ascending or descending order.
> Recently, I found that ORDER BY with the first descending order key causes
> wrong result.
> If second key is a descending order, it works well. Other cases work
> correctly.
> {code}
> select l_orderkey, l_partkey from lineitem order by l_orderkey, l_partkey
> desc;
> l_orderkey, l_partkey
> -------------------------------
> 1, 155190
> 1, 67310
> 1, 63700
> 1, 24027
> 1, 15635
> 1, 2132
> 2, 106170
> 3, 183095
> 3, 128449
> 3, 62143
> 3, 29380
> 3, 19036
> 3, 4297
> ...
> {code}
> But, if the first sort key is a descending order, it causes wrong row number
> and shows wrong range part as follows:
> {code}
> default> select l_orderkey, l_partkey from lineitem order by l_orderkey desc,
> l_partkey;
> l_orderkey, l_partkey
> -------------------------------
> 3000000, 61045
> 3000000, 159113
> 3000000, 167695
> 3000000, 167904
> 3000000, 196339
> ...
> {code}
> According to my investigation, it seems to be related to offset problem of
> RowFile or index problem. The final result includes duplicated rows and the
> final row was wrong as follows:
> {code:title=part-02-000000-000}
> 3000000|61045
> 3000000|159113
> 3000000|167695
> 3000000|167904
> 3000000|196339
> 2999975|28334
> 2999975|194023
> 2999974|8020
> 2999974|124152
> 2999974|129921
> 2999974|139248
> 2999974|168914
> 2999974|187923
> 2999973|30533
> 2999973|36196
> ...
> 2919713|133486
> 2919713|195963
> 2919712|86257
> 2919712|94542
> 2919712|107370
> 2919712|166342 <- duplicated rows
> 2919712|178277
> ....
> 1|63700
> 1|67310
> 1|155190
> [EOF]
> {code}
> {code:title=part-02-000001-000}
> |96127 <- looks wrong
> 6000000|32255
> 6000000|96127
> 5999975|6452
> 5999975|7272
> 5999975|37131
> ....
> ....
> 2919713|133486
> 2919713|195963
> 2919712|94542
> 2919712|107370
> 2919712|166342 <- duplicated rows
> [EOF]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)