[
https://issues.apache.org/jira/browse/DRILL-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574942#comment-14574942
]
Deneche A. Hakim commented on DRILL-3254:
-----------------------------------------
Calcite has a rule StandardConvertletTable.AvgVarianceConvertlet that reduces
window AVG(X) to SUM(X) / COUNT(X) but without introducing a CASTHIGH.
Disabling that rule seems to give correct results:
{noformat}
0: jdbc:drill:zk=local> select avg(salary) over(partition by position_id order
by sub) from `windowData/b1.p1`;
+---------------------+
| EXPR$0 |
+---------------------+
| 11.0 |
| 11.666666666666666 |
| 11.666666666666666 |
| 12.333333333333334 |
| 12.333333333333334 |
| 12.333333333333334 |
| 13.0 |
| 13.0 |
| 13.0 |
| 13.0 |
| 13.666666666666666 |
| 13.666666666666666 |
| 13.666666666666666 |
| 13.666666666666666 |
| 13.666666666666666 |
| 14.25 |
| 14.25 |
| 14.25 |
| 14.25 |
| 14.25 |
+---------------------+
20 rows selected (0.609 seconds)
{noformat}
Although now, AVG is no longer reduced:
{noformat}
explain plan for select avg(salary) over(partition by position_id) from
`windowData/b1.p1`;
00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 Project(w0$o0=[$3])
00-03 Window(window#0=[window(partition {2} order by [] range between
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [AVG($1)])])
00-04 SelectionVectorRemover
00-05 Sort(sort0=[$2], dir0=[ASC])
00-06 Project(T0¦¦*=[$0], salary=[$1], position_id=[$2])
00-07 Scan(groupscan=[EasyGroupScan
[selectionRoot=/Users/hakim/MapR/data/windowData/b1.p1, numFiles=1,
columns=[`*`],
files=[file:/Users/hakim/MapR/data/windowData/b1.p1/0.data.json]]])
{noformat}
> Average over window functions returns wrong results
> ---------------------------------------------------
>
> Key: DRILL-3254
> URL: https://issues.apache.org/jira/browse/DRILL-3254
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.1.0
> Reporter: Abhishek Girish
> Assignee: Deneche A. Hakim
> Labels: window_function
> Fix For: 1.1.0
>
>
> Average function on numeric column returns an (inaccurate) integer value,
> instead of an (accurate) decimal (or floating point) value.
> *Results from Drill:*
> {code:sql}
> > select s_city, s_store_sk, avg(s_number_employees) over (PARTITION BY
> > s_city ORDER BY s_store_sk) from store limit 10;
> +-----------+-------------+---------+
> | s_city | s_store_sk | EXPR$2 |
> +-----------+-------------+---------+
> | Fairview | 5 | 288 |
> | Fairview | 8 | 283 |
> | Fairview | 12 | 286 |
> | Midway | 1 | 245 |
> | Midway | 2 | 240 |
> | Midway | 3 | 239 |
> | Midway | 4 | 233 |
> | Midway | 6 | 232 |
> | Midway | 7 | 243 |
> | Midway | 9 | 247 |
> +-----------+-------------+---------+
> 10 rows selected (0.197 seconds)
> {code}
> *Results from Postgres:*
> {code:sql}
> # select s_city, s_store_sk, avg(s_number_employees) over (PARTITION BY
> s_city ORDER BY s_store_sk) from store limit 10;
> s_city | s_store_sk | avg
> ----------+------------+----------------------
> Fairview | 5 | 288.0000000000000000
> Fairview | 8 | 283.0000000000000000
> Fairview | 12 | 286.6666666666666667
> Midway | 1 | 245.0000000000000000
> Midway | 2 | 240.5000000000000000
> Midway | 3 | 239.0000000000000000
> Midway | 4 | 233.7500000000000000
> Midway | 6 | 232.8000000000000000
> Midway | 7 | 243.5000000000000000
> Midway | 9 | 247.4285714285714286
> (10 rows)
> {code}
> Drill returns right results without window functions:
> {code:sql}
> > select s_city, s_store_sk, avg(s_number_employees) from store group by
> > s_city, s_store_sk order by 1,2 limit 10;
> +-----------+-------------+---------+
> | s_city | s_store_sk | EXPR$2 |
> +-----------+-------------+---------+
> | Fairview | 5 | 288.0 |
> | Fairview | 8 | 278.0 |
> | Fairview | 12 | 294.0 |
> | Midway | 1 | 245.0 |
> | Midway | 2 | 236.0 |
> | Midway | 3 | 236.0 |
> | Midway | 4 | 218.0 |
> | Midway | 6 | 229.0 |
> | Midway | 7 | 297.0 |
> | Midway | 9 | 271.0 |
> +-----------+-------------+---------+
> 10 rows selected (0.306 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)