[
https://issues.apache.org/jira/browse/MADLIB-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313214#comment-15313214
]
Orhan Kislal commented on MADLIB-908:
-------------------------------------
I am thinking of following the "Encoding Categorical Variables" example.
(http://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html)
I followed the example and checked the sql query created by this function:
CREATE TABLE abalone_out AS
(SELECT *', ',
( CASE WHEN "sex" = '' THEN 1 ELSE 0 END ) as "sex_",
( CASE WHEN "sex" = 'F' THEN 1 ELSE 0 END ) as "sex_F",
( CASE WHEN "sex" = 'I' THEN 1 ELSE 0 END ) as "sex_I",
( CASE WHEN "sex" = 'M' THEN 1 ELSE 0 END ) as "sex_M"', ' FROM
abalone)
I am using a very simple dataset to begin with:
id | piv | val
----+-----+-----
0 | 10 | 1
0 | 10 | 2
0 | 20 | 3
1 | 20 | 4
1 | 30 | 5
1 | 30 | 6
1 | 10 | 7
Here is the sql query that pivots the data:
CREATE TABLE piv_out AS
(SELECT id, piv, sum(val) FROM
(SELECT *,
( CASE WHEN piv = 10 THEN 1 ELSE 0 END ) as "piv_10",
( CASE WHEN piv = 20 THEN 1 ELSE 0 END ) as "piv_20",
( CASE WHEN piv = 30 THEN 1 ELSE 0 END ) as "piv_30"
FROM pivset) x
GROUP BY id, piv, piv_10, piv_20, piv_30
ORDER BY id, piv);
The output table has the following:
id | piv | sum
----+-----+-----
0 | 10 | 3
0 | 20 | 3
1 | 10 | 7
1 | 20 | 4
1 | 30 | 11
Now, I will work on the python code that generates this sql query. First, I
will implement a version that supports - single index column
- single pivot column
- single value column
- single aggregate function = SUM
- no fill value
Please let me know if you have any comments on my approach.
> Pivoting
> --------
>
> Key: MADLIB-908
> URL: https://issues.apache.org/jira/browse/MADLIB-908
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform pivoting operations on my data, so
> that I can prepare it for input to predictive analytics algorithms.
> Detailed requirements TBD.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)