[ 
https://issues.apache.org/jira/browse/MADLIB-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313214#comment-15313214
 ] 

Orhan Kislal commented on MADLIB-908:
-------------------------------------

I am thinking of following the "Encoding Categorical Variables" example. 
(http://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html)

I followed the example and checked the sql query created by this function:
CREATE TABLE abalone_out AS
        (SELECT *', ',
                ( CASE WHEN "sex" = '' THEN 1 ELSE 0 END ) as "sex_",
                ( CASE WHEN "sex" = 'F' THEN 1 ELSE 0 END ) as "sex_F",
                ( CASE WHEN "sex" = 'I' THEN 1 ELSE 0 END ) as "sex_I",
                ( CASE WHEN "sex" = 'M' THEN 1 ELSE 0 END ) as "sex_M"', ' FROM 
abalone)

I am using a very simple dataset to begin with:
id | piv | val
----+-----+-----
  0 |  10 |   1
  0 |  10 |   2
  0 |  20 |   3
  1 |  20 |   4
  1 |  30 |   5
  1 |  30 |   6
  1 |  10 |   7

Here is the sql query that pivots the data:

CREATE TABLE piv_out AS
        (SELECT id, piv, sum(val) FROM
                (SELECT *,
                        ( CASE WHEN piv = 10 THEN 1 ELSE 0 END ) as "piv_10",
                        ( CASE WHEN piv = 20 THEN 1 ELSE 0 END ) as "piv_20",
                        ( CASE WHEN piv = 30 THEN 1 ELSE 0 END ) as "piv_30" 
FROM pivset) x
                GROUP BY id, piv, piv_10, piv_20, piv_30
                ORDER BY id, piv);

The output table has the following:

id | piv | sum
----+-----+-----
  0 |  10 |   3
  0 |  20 |   3
  1 |  10 |   7
  1 |  20 |   4
  1 |  30 |  11

Now, I will work on the python code that generates this sql query. First, I 
will implement a version that supports - single index column
- single pivot column
- single value column
- single aggregate function = SUM
- no fill value

Please let me know if you have any comments on my approach.

> Pivoting
> --------
>
>                 Key: MADLIB-908
>                 URL: https://issues.apache.org/jira/browse/MADLIB-908
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform pivoting operations on my data, so 
> that I can prepare it for input to predictive analytics algorithms.
> Detailed requirements TBD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to