[
https://issues.apache.org/jira/browse/MADLIB-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313401#comment-15313401
]
Orhan Kislal commented on MADLIB-908:
-------------------------------------
I think we can push some of the aggregate down.
In this case we will have 3 steps in the procedure:
- For every unique id,piv pair aggregate values
- Create piv_X columns and fill with tmp_val or 0
- For every id, collapse every piv_X column into single value since it has a
bunch of 0s and only one actual value.
The intermediate table will be smaller and the second aggregate is bound by the
number of different pivot values. Since there is a limit to the number of
columns a table can have I think this bound is fairly manageable. Here is the
sql query:
SELECT id, sum(piv_10), sum(piv_20), sum(piv_30)
FROM
(SELECT id,
(CASE WHEN piv = 10 THEN tmp_sum ELSE 0 END) as "piv_10" ,
(CASE WHEN piv = 20 THEN tmp_sum ELSE 0 END) as "piv_20" ,
(CASE WHEN piv = 30 THEN tmp_sum ELSE 0 END) as "piv_30"
FROM
(SELECT id, piv, sum(val) AS tmp_sum FROM pivset GROUP BY id,piv) x
GROUP BY id, piv, tmp_sum) y
GROUP BY id ORDER BY id;
> Pivoting
> --------
>
> Key: MADLIB-908
> URL: https://issues.apache.org/jira/browse/MADLIB-908
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform pivoting operations on my data, so
> that I can prepare it for input to predictive analytics algorithms.
> Detailed requirements TBD.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)