[
https://issues.apache.org/jira/browse/DRILL-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538988#comment-14538988
]
Jinfeng Ni commented on DRILL-2953:
-----------------------------------
One more case:
{code}
select cast(columns[0] as int) as nation_key from
dfs_test.`file:/Users/jni/work/incubator-drill/exec/java-exec/target/test-classes/store/text/data/nations.csv`
group by columns[0], cast(columns[1] as int), columns[2] order by
columns[0], columns[2]
text json
00-00 Screen
00-01 Project(nation_key=[$0])
00-02 SelectionVectorRemover
00-03 Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])
00-04 Project(nation_key=[CAST($0):INTEGER], EXPR$1=[$0],
EXPR$2=[$2])
00-05 StreamAgg(group=[{0, 1, 2}])
00-06 Sort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC],
dir1=[ASC], dir2=[ASC])
00-07 Project($f0=[ITEM($0, 0)], $f1=[CAST(ITEM($0,
1)):INTEGER], $f2=[ITEM($0, 2)])
00-08 Scan(groupscan=
{code}
> Group By + Order By query results are not ordered.
> --------------------------------------------------
>
> Key: DRILL-2953
> URL: https://issues.apache.org/jira/browse/DRILL-2953
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.9.0
> Environment: 10833d2cae9f5312cf0e31f8c9f3f8a9dcdc0c45 | Commit 0.9.0
> release version. | 03.05.2015 @ 14:56:56 EDT
> Reporter: Khurram Faraaz
> Assignee: Jinfeng Ni
> Priority: Critical
> Fix For: 1.0.0
>
> Attachments:
> 0001-DRILL-2953-Ensure-sort-would-be-enforced-when-a-cast.patch
>
>
> Group by + order by query does not return results in correct order. Sort is
> performed before the aggregation is done, which should not be the case.
> Test was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select cast(columns[0] as int) c1 from `testWindow.csv` t2
> where t2.columns[0] is not null group by columns[0] order by columns[0];
> +------------+
> | c1 |
> +------------+
> | 10 |
> | 100 |
> | 113 |
> | 119 |
> | 2 |
> | 50 |
> | 55 |
> | 57 |
> | 61 |
> | 67 |
> | 89 |
> +------------+
> 11 rows selected (0.218 seconds)
> {code}
> Explain plan for that query that returns wrong results.
> {code}
> 0: jdbc:drill:> explain plan for select cast(columns[0] as int) c1 from
> `testWindow.csv` t2 where t2.columns[0] is not null group by columns[0] order
> by columns[0];
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(c1=[$0])
> 00-02 Project(c1=[CAST($0):INTEGER], EXPR$1=[$0])
> 00-03 StreamAgg(group=[{0}])
> 00-04 Sort(sort0=[$0], dir0=[ASC])
> 00-05 Filter(condition=[IS NOT NULL($0)])
> 00-06 Project(ITEM=[ITEM($0, 0)])
> 00-07 Scan(groupscan=[EasyGroupScan
> [selectionRoot=/tmp/testWindow.csv, numFiles=1, columns=[`columns`[0]],
> files=[maprfs:/tmp/testWindow.csv]]])
> {code}
> Incorrect results , not in order.
> {code}
> 0: jdbc:drill:> select cast(columns[0] as int) from `testWindow.csv` t2 where
> t2.columns[0] is not null group by columns[0] order by columns[0];
> +------------+
> | EXPR$0 |
> +------------+
> | 10 |
> | 100 |
> | 113 |
> | 119 |
> | 2 |
> | 50 |
> | 55 |
> | 57 |
> | 61 |
> | 67 |
> | 89 |
> +------------+
> 11 rows selected (0.214 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)