[ 
https://issues.apache.org/jira/browse/DRILL-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272822#comment-14272822
 ] 

Deneche A. Hakim edited comment on DRILL-1908 at 1/11/15 2:42 PM:
------------------------------------------------------------------

I cleaned the code and added support for _order by_ clause, I also added a new 
unit test for the _order by_ feature. Here is an example of a query with an 
_order by_ clause:
{noformat}
SELECT employee_id, position_id, salary, SUM(salary) OVER (PARTITION BY 
position_id ORDER BY salary) AS sum_sal FROM cp.`employee.json` ORDER BY 
position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id |   salary   |  sum_sal   |
+-------------+-------------+------------+------------+
| 1           | 1           | 80000.0    | 80000.0    |
| 20          | 2           | 30000.0    | 30000.0    |
| 22          | 2           | 35000.0    | 135000.0   |
| 5           | 2           | 35000.0    | 135000.0   |
| 21          | 2           | 35000.0    | 135000.0   |
| 4           | 2           | 40000.0    | 215000.0   |
| 2           | 2           | 40000.0    | 215000.0   |
| 6           | 3           | 25000.0    | 25000.0    |
| 7           | 4           | 15000.0    | 15000.0    |
| 10          | 5           | 50000.0    | 50000.0    |
| 37          | 6           | 6700.0     | 6700.0     |
| 38          | 6           | 8000.0     | 14700.0    |
| 39          | 6           | 10000.0    | 34700.0    |
| 40          | 6           | 10000.0    | 34700.0    |
| 42          | 7           | 5000.0     | 5000.0     |
| 41          | 7           | 8500.0     | 13500.0    |
| 36          | 7           | 45000.0    | 58500.0    |
| 44          | 8           | 5000.0     | 5000.0     |
| 43          | 8           | 6700.0     | 11700.0    |
| 49          | 9           | 5000.0     | 20000.0    |
| 52          | 9           | 5000.0     | 20000.0    |
| 51          | 9           | 5000.0     | 20000.0    |
| 50          | 9           | 5000.0     | 20000.0    |
| 47          | 9           | 6500.0     | 26500.0    |
| 46          | 9           | 6600.0     | 33100.0    |
| 45          | 9           | 6800.0     | 39900.0    |
| 48          | 9           | 7200.0     | 47100.0    |
| 34          | 11          | 7000.0     | 7000.0     |
| 29          | 11          | 8500.0     | 24000.0    |
| 12          | 11          | 8500.0     | 24000.0    |
| 17          | 11          | 10000.0    | 54000.0    |
| 19          | 11          | 10000.0    | 54000.0    |
| 8           | 11          | 10000.0    | 54000.0    |
| 35          | 11          | 11000.0    | 65000.0    |
| 15          | 11          | 12000.0    | 89000.0    |
| 25          | 11          | 12000.0    | 89000.0    |
| 28          | 11          | 14000.0    | 117000.0   |
| 31          | 11          | 14000.0    | 117000.0   |
| 26          | 11          | 15000.0    | 192000.0   |
| 11          | 11          | 15000.0    | 192000.0   |
| 13          | 11          | 15000.0    | 192000.0   |
| 23          | 11          | 15000.0    | 192000.0   |
| 30          | 11          | 15000.0    | 192000.0   |
| 27          | 11          | 16000.0    | 208000.0   |
| 16          | 11          | 17000.0    | 327000.0   |
| 14          | 11          | 17000.0    | 327000.0   |
| 33          | 11          | 17000.0    | 327000.0   |
| 24          | 11          | 17000.0    | 327000.0   |
| 9           | 11          | 17000.0    | 327000.0   |
| 32          | 11          | 17000.0    | 327000.0   |
+-------------+-------------+------------+------------+
{noformat}
The new code still needs more unit tests (with multiple batches), can be 
optimized to aggregate a frame a single time for all peer rows and it doesn't 
yet support the _frame clause_. But it gives correct results, so it's already a 
fix for DRILL-1487.


was (Author: adeneche):
I cleaned the code and added support for _order by_ clause, I also added a new 
unit test for the _order by_ feature. Here is an example of a query with an 
_order by_ clause:
{noformat}
SELECT employee_id, position_id, salary, SUM(salary) OVER (PARTITION BY 
position_id ORDER BY salary) AS sum_sal FROM cp.`employee.json` ORDER BY 
position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id |   salary   |  sum_sal   |
+-------------+-------------+------------+------------+
| 1           | 1           | 80000.0    | 80000.0    |
| 20          | 2           | 30000.0    | 30000.0    |
| 22          | 2           | 35000.0    | 135000.0   |
| 5           | 2           | 35000.0    | 135000.0   |
| 21          | 2           | 35000.0    | 135000.0   |
| 4           | 2           | 40000.0    | 215000.0   |
| 2           | 2           | 40000.0    | 215000.0   |
| 6           | 3           | 25000.0    | 25000.0    |
| 7           | 4           | 15000.0    | 15000.0    |
| 10          | 5           | 50000.0    | 50000.0    |
| 37          | 6           | 6700.0     | 6700.0     |
| 38          | 6           | 8000.0     | 14700.0    |
| 39          | 6           | 10000.0    | 34700.0    |
| 40          | 6           | 10000.0    | 34700.0    |
| 42          | 7           | 5000.0     | 5000.0     |
| 41          | 7           | 8500.0     | 13500.0    |
| 36          | 7           | 45000.0    | 58500.0    |
| 44          | 8           | 5000.0     | 5000.0     |
| 43          | 8           | 6700.0     | 11700.0    |
| 49          | 9           | 5000.0     | 20000.0    |
| 52          | 9           | 5000.0     | 20000.0    |
| 51          | 9           | 5000.0     | 20000.0    |
| 50          | 9           | 5000.0     | 20000.0    |
| 47          | 9           | 6500.0     | 26500.0    |
| 46          | 9           | 6600.0     | 33100.0    |
| 45          | 9           | 6800.0     | 39900.0    |
| 48          | 9           | 7200.0     | 47100.0    |
| 34          | 11          | 7000.0     | 7000.0     |
| 29          | 11          | 8500.0     | 24000.0    |
| 12          | 11          | 8500.0     | 24000.0    |
| 17          | 11          | 10000.0    | 54000.0    |
| 19          | 11          | 10000.0    | 54000.0    |
| 8           | 11          | 10000.0    | 54000.0    |
| 35          | 11          | 11000.0    | 65000.0    |
| 15          | 11          | 12000.0    | 89000.0    |
| 25          | 11          | 12000.0    | 89000.0    |
| 28          | 11          | 14000.0    | 117000.0   |
| 31          | 11          | 14000.0    | 117000.0   |
| 26          | 11          | 15000.0    | 192000.0   |
| 11          | 11          | 15000.0    | 192000.0   |
| 13          | 11          | 15000.0    | 192000.0   |
| 23          | 11          | 15000.0    | 192000.0   |
| 30          | 11          | 15000.0    | 192000.0   |
| 27          | 11          | 16000.0    | 208000.0   |
| 16          | 11          | 17000.0    | 327000.0   |
| 14          | 11          | 17000.0    | 327000.0   |
| 33          | 11          | 17000.0    | 327000.0   |
| 24          | 11          | 17000.0    | 327000.0   |
| 9           | 11          | 17000.0    | 327000.0   |
| 32          | 11          | 17000.0    | 327000.0   |
+-------------+-------------+------------+------------+
{noformat}
The new code still needs more unit tests (with multiple batches), can be 
optimised to not aggregate the same frame multiple times (for peer rows) and it 
doesn't yet support the _frame clause_. But it gives correct results, so it's 
already a fix for DRILL-1487.

> new window function implementation
> ----------------------------------
>
>                 Key: DRILL-1908
>                 URL: https://issues.apache.org/jira/browse/DRILL-1908
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Operators
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>             Fix For: Future
>
>
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch may be needed. The purpose of this issue is 
> to report my progress and share my thoughts with the community in order to 
> get a proper implementation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to