[
https://issues.apache.org/jira/browse/DRILL-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272822#comment-14272822
]
Deneche A. Hakim commented on DRILL-1908:
-----------------------------------------
I cleaned the code and added support for _order by_ clause, I also added a new
unit test for the _order by_ feature. Here is an example of a query with an
_order by_ clause:
{noformat}
SELECT employee_id, position_id, salary, SUM(salary) OVER (PARTITION BY
position_id ORDER BY salary) AS sum_sal FROM cp.`employee.json` ORDER BY
position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id | salary | sum_sal |
+-------------+-------------+------------+------------+
| 1 | 1 | 80000.0 | 80000.0 |
| 20 | 2 | 30000.0 | 30000.0 |
| 22 | 2 | 35000.0 | 135000.0 |
| 5 | 2 | 35000.0 | 135000.0 |
| 21 | 2 | 35000.0 | 135000.0 |
| 4 | 2 | 40000.0 | 215000.0 |
| 2 | 2 | 40000.0 | 215000.0 |
| 6 | 3 | 25000.0 | 25000.0 |
| 7 | 4 | 15000.0 | 15000.0 |
| 10 | 5 | 50000.0 | 50000.0 |
| 37 | 6 | 6700.0 | 6700.0 |
| 38 | 6 | 8000.0 | 14700.0 |
| 39 | 6 | 10000.0 | 34700.0 |
| 40 | 6 | 10000.0 | 34700.0 |
| 42 | 7 | 5000.0 | 5000.0 |
| 41 | 7 | 8500.0 | 13500.0 |
| 36 | 7 | 45000.0 | 58500.0 |
| 44 | 8 | 5000.0 | 5000.0 |
| 43 | 8 | 6700.0 | 11700.0 |
| 49 | 9 | 5000.0 | 20000.0 |
| 52 | 9 | 5000.0 | 20000.0 |
| 51 | 9 | 5000.0 | 20000.0 |
| 50 | 9 | 5000.0 | 20000.0 |
| 47 | 9 | 6500.0 | 26500.0 |
| 46 | 9 | 6600.0 | 33100.0 |
| 45 | 9 | 6800.0 | 39900.0 |
| 48 | 9 | 7200.0 | 47100.0 |
| 34 | 11 | 7000.0 | 7000.0 |
| 29 | 11 | 8500.0 | 24000.0 |
| 12 | 11 | 8500.0 | 24000.0 |
| 17 | 11 | 10000.0 | 54000.0 |
| 19 | 11 | 10000.0 | 54000.0 |
| 8 | 11 | 10000.0 | 54000.0 |
| 35 | 11 | 11000.0 | 65000.0 |
| 15 | 11 | 12000.0 | 89000.0 |
| 25 | 11 | 12000.0 | 89000.0 |
| 28 | 11 | 14000.0 | 117000.0 |
| 31 | 11 | 14000.0 | 117000.0 |
| 26 | 11 | 15000.0 | 192000.0 |
| 11 | 11 | 15000.0 | 192000.0 |
| 13 | 11 | 15000.0 | 192000.0 |
| 23 | 11 | 15000.0 | 192000.0 |
| 30 | 11 | 15000.0 | 192000.0 |
| 27 | 11 | 16000.0 | 208000.0 |
| 16 | 11 | 17000.0 | 327000.0 |
| 14 | 11 | 17000.0 | 327000.0 |
| 33 | 11 | 17000.0 | 327000.0 |
| 24 | 11 | 17000.0 | 327000.0 |
| 9 | 11 | 17000.0 | 327000.0 |
| 32 | 11 | 17000.0 | 327000.0 |
+-------------+-------------+------------+------------+
{noformat}
The new code still needs more unit tests (with multiple batches), can be
optimised to not aggregate the same frame multiple times (for peer rows) and it
doesn't yet support the _frame clause_. But it gives correct results, so it's
already a fix for DRILL-1487.
> new window function implementation
> ----------------------------------
>
> Key: DRILL-1908
> URL: https://issues.apache.org/jira/browse/DRILL-1908
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Operators
> Reporter: Deneche A. Hakim
> Assignee: Deneche A. Hakim
> Priority: Critical
> Fix For: Future
>
>
> In order to fix DRILL-1487 a complete rewrite of the
> StreamingWindowFrameRecordBatch may be needed. The purpose of this issue is
> to report my progress and share my thoughts with the community in order to
> get a proper implementation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)