[ 
https://issues.apache.org/jira/browse/MADLIB-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313401#comment-15313401
 ] 

Orhan Kislal commented on MADLIB-908:
-------------------------------------

I think we can push some of the aggregate down. 
In this case we will have 3 steps in the procedure:
- For every unique id,piv pair aggregate values
- Create piv_X columns and fill with tmp_val or 0
- For every id, collapse every piv_X column into single value since it has a 
bunch of 0s and only one actual value. 

The intermediate table will be smaller and the second aggregate is bound by the 
number of different pivot values. Since there is a limit to the number of 
columns a table can have I think this bound is fairly manageable. Here is the 
sql query:

SELECT id, sum(piv_10), sum(piv_20), sum(piv_30)
FROM
(SELECT id,
(CASE WHEN piv = 10 THEN tmp_sum ELSE 0 END) as "piv_10" ,
(CASE WHEN piv = 20 THEN tmp_sum ELSE 0 END) as "piv_20" ,
(CASE WHEN piv = 30 THEN tmp_sum ELSE 0 END) as "piv_30"
FROM
(SELECT id, piv, sum(val) AS tmp_sum FROM pivset GROUP BY id,piv) x 
GROUP BY id, piv, tmp_sum) y
GROUP BY id ORDER BY id;

> Pivoting
> --------
>
>                 Key: MADLIB-908
>                 URL: https://issues.apache.org/jira/browse/MADLIB-908
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform pivoting operations on my data, so 
> that I can prepare it for input to predictive analytics algorithms.
> Detailed requirements TBD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to