[
https://issues.apache.org/jira/browse/DRILL-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha updated DRILL-3404:
------------------------------
Attachment: 0001-DRILL-3304-Use-SUM-instead-of-SUM0-for-window-aggreg.patch
Uploaded Drill side changes - unit test and update to pom.xml. Note that the
commit message mistakenly says DRILL-3304 instead of DRILL-3404.
> Filter on window function does not appear in query plan
> -------------------------------------------------------
>
> Key: DRILL-3404
> URL: https://issues.apache.org/jira/browse/DRILL-3404
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.1.0
> Environment: 4 node cluster on CentOS
> Reporter: Khurram Faraaz
> Assignee: Aman Sinha
> Priority: Critical
> Labels: window_function
> Fix For: 1.1.0
>
> Attachments:
> 0001-DRILL-3304-Use-SUM-instead-of-SUM0-for-window-aggreg.patch,
> 0001-Don-t-use-SumEmptyIsZero-SUM0-window-aggregate-until.patch, 0_0_0.parquet
>
>
> Filter is missing in the query plan for the below query in Drill, and hence
> wrong results are returned.
> Results from Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select c1, c2, w_sum from ( select c1, c2, sum
> ( c1 ) over ( partition by c2 order by c1 asc nulls first ) w_sum from
> `tblWnulls` ) sub_query where w_sum is not null;
> +-------------+-------+-------------+
> | c1 | c2 | w_sum |
> +-------------+-------+-------------+
> | 0 | a | 0 |
> | 1 | a | 1 |
> | 5 | a | 6 |
> | 10 | a | 16 |
> | 11 | a | 27 |
> | 14 | a | 41 |
> | 11111 | a | 11152 |
> | 2 | b | 2 |
> | 9 | b | 11 |
> | 13 | b | 24 |
> | 17 | b | 41 |
> | null | c | null |
> | 4 | c | 4 |
> | 6 | c | 10 |
> | 8 | c | 18 |
> | 12 | c | 30 |
> | 13 | c | 56 |
> | 13 | c | 56 |
> | null | d | null |
> | null | d | null |
> | 10 | d | 10 |
> | 11 | d | 21 |
> | 2147483647 | d | 4294967315 |
> | 2147483647 | d | 4294967315 |
> | -1 | e | -1 |
> | 15 | e | 14 |
> | null | null | null |
> | 19 | null | 19 |
> | 65536 | null | 65555 |
> | 1000000 | null | 1065555 |
> +-------------+-------+-------------+
> 30 rows selected (0.337 seconds)
> {code}
> Explain plan for the above query from Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select c1, c2, w_sum from (
> select c1, c2, sum ( c1 ) over ( partition by c2 order by c1 asc nulls first
> ) w_sum from `tblWnulls` ) sub_query where w_sum is not null;
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
> |
>
>
>
> text
>
>
>
> | json |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
> | 00-00 Screen
> 00-01 Project(c1=[$0], c2=[$1], w_sum=[$2])
> 00-02 Project(c1=[$0], c2=[$1], w_sum=[CASE(>($2, 0), $3, null)])
> 00-03 Window(window#0=[window(partition {1} order by [0
> ASC-nulls-first] range between UNBOUNDED PRECEDING and CURRENT ROW aggs
> [COUNT($0), $SUM0($0)])])
> 00-04 SelectionVectorRemover
> 00-05 Sort(sort0=[$1], sort1=[$0], dir0=[ASC],
> dir1=[ASC-nulls-first])
> 00-06 Project(c1=[$1], c2=[$0])
> 00-07 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/tblWnulls]],
> selectionRoot=/tmp/tblWnulls, numFiles=1, columns=[`c1`, `c2`]]])
> {code}
> Results from Postgres 9.3
> {code}
> postgres=# select c1, c2, w_sum from ( select c1, c2, sum ( c1 ) over (
> partition by c2 order by c1 asc nulls first ) w_sum from t222 ) sub_query
> where w_sum is not null;
> c1 | c2 | w_sum
> ------------+----+------------
> 0 | a | 0
> 1 | a | 1
> 5 | a | 6
> 10 | a | 16
> 11 | a | 27
> 14 | a | 41
> 11111 | a | 11152
> 2 | b | 2
> 9 | b | 11
> 13 | b | 24
> 17 | b | 41
> 4 | c | 4
> 6 | c | 10
> 8 | c | 18
> 12 | c | 30
> 13 | c | 56
> 13 | c | 56
> 10 | d | 10
> 11 | d | 21
> 2147483647 | d | 4294967315
> 2147483647 | d | 4294967315
> -1 | e | -1
> 15 | e | 14
> 19 | | 19
> 65536 | | 65555
> 1000000 | | 1065555
> (26 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)