Paul Rogers created DRILL-5143:
----------------------------------
Summary: Planner creates redundant sort for nested query on single
fragment
Key: DRILL-5143
URL: https://issues.apache.org/jira/browse/DRILL-5143
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor
The unit test {{TestWindowFrame.testUnboundedFollowing}} identified an
optimization opportunity in the planner. It uses the following query:
{code}
SELECT
position_id,
employee_id,
MAX(employee_id) OVER(PARTITION BY position_id) AS `last_value`
FROM (
SELECT *
FROM dfs_test.`%s/window/b4.p4`
ORDER BY position_id, employee_id )
{code}
Which produces the following (heavily elided) plan:
{code}
"graph" : [ {
"pop" : "fs-scan",
...
}, {
"pop" : "project",
...
}, {
"pop" : "external-sort",
"orderings" : [ {
"order" : "ASC",
"expr" : "`position_id`",
"nullDirection" : "UNSPECIFIED"
}, {
"order" : "ASC",
"expr" : "`employee_id`",
"nullDirection" : "UNSPECIFIED"
} ],
...
}, {
"pop" : "selection-vector-remover",
...
}, {
"pop" : "project",
...
"exprs" : [ {
"ref" : "`$0`",
"expr" : "`T0¦¦position_id`"
}...
}, {
"pop" : "external-sort",
...
"orderings" : [ {
"order" : "ASC",
"expr" : "`$0`",
"nullDirection" : "UNSPECIFIED"
} ],
...
}, {
"pop" : "selection-vector-remover",
...
}, {
"pop" : "window",
...
"aggregations" : [ {
"ref" : "`w0$o0`",
"expr" : "max(`$1`) "
} ],
...
}, {
"pop" : "project",
...
}, {
"pop" : "screen",
...
} ]
{code}
Note that two sorts are stacked one atop the other. This is a "degenerate"
plan; normally a shuffle operation would occur between the two sorts in a
distributed query. But, because the query is so small, it runs on a single node
and has a redundant sort.
Either:
* Note that the data is already sorted and omit the downstream sort, or
* Note that the inner query sort is ignored by downstream operators and discard
the inner sort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)