[ 
https://issues.apache.org/jira/browse/MADLIB-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316959#comment-15316959
 ] 

Frank McQuillan edited comment on MADLIB-1004 at 6/17/16 6:53 PM:
------------------------------------------------------------------

Here is the proposed MVP pivot interface.  It is similar to Pandas [2] with 
some naming changes:

{code}
madlib.pivot (
   source_table,
   output_table,
   index,  
   pivot_cols,
   pivot_values, 
   aggregate_func,
   keep_null,
   fill_value
)
{code}

source_table
VARCHAR.  Name of the source table.  Can also be a view.

output_table
VARCHAR.  Name of the result table.

index
VARCHAR.  Single column or list of comma-separated columns that will form the 
index of the output pivot table.  By index we mean the values to group by in 
the rows of the output pivot table.

pivot_cols
VARCHAR.  Single column or list of comma-separated columns that will form the 
columns of the output pivot table.

pivot_values
VARCHAR.  A comma-separated list of columns that contain the values to be 
summarized in the output pivot table.  For more than one value, treat like 
Pandas and add a block off to the right for each new value.

aggregate_func
VARCHAR.  A comma-separated list of aggregates to be applied to values.  These 
can be PostgreSQL aggregates or UDAs.  List of aggregates applied to lists of 
values to be supported.  See Pandas for example.

keep_null (optional)
BOOLEAN. default: FALSE.   If TRUE, then pivot columns are created 
corresponding to NULL categories.  If FALSE, then no pivot columns will be 
created for NULL categories.

fill_value (optional)
VARCHAR.  If specified, determines how to fill NULL values resulting from pivot 
operation.  For example, user may want to set NULLs to zero.  This is a global 
parameter (not applied per aggregate) and is applied post-aggregation to the 
output table.  Otherwise these values will be left as determined by PostgreSQL.


was (Author: fmcquillan):
Here is the proposed MVP pivot interface.  It is similar to Pandas [2] with 
some naming changes:

{code}
madlib.pivot (
   source_table,
   output_table,
   index,  
   pivot_cols,
   pivot_values, 
   aggregate_func,
   keep_null,
   fill_value
)
{code}

source_table
VARCHAR.  Name of the source table.  Can also be a view.

output_table
VARCHAR.  Name of the result table.

index
VARCHAR.  Single column or list of comma-separated columns that will form the 
index of the output pivot table.  By index we mean the values to group by in 
the rows of the output pivot table.

pivot_cols
VARCHAR.  Single column or list of comma-separated columns that will form the 
columns of the output pivot table.

pivot_values
VARCHAR.  A comma-separated list of columns that contain the values to be 
summarized in the output pivot table.  For more than one value, treat like 
Pandas and add a block off to the right for each new value.

aggregate_func
VARCHAR.  A comma-separated list of aggregates to be applied to values.  These 
can be PostgreSQL aggregates or UDAs.  Multiple aggregates can be specified, 
and will be applied separately to value columns.  Like Pandas, need to support 
having certain aggregates apply to certain value columns.

keep_null (optional)
BOOLEAN. default: FALSE.   If TRUE, then pivot columns are created 
corresponding to NULL categories.  If FALSE, then no pivot columns will be 
created for NULL categories.

fill_value (optional)
VARCHAR.  If specified, determines how to fill NULL values resulting from pivot 
operation.  For example, user may want to set NULLs to zero.  Otherwise these 
values will be left as determined by PostgreSQL.


> Pivoting - Phase 2 (advanced pivot)
> -----------------------------------
>
>                 Key: MADLIB-1004
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1004
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>
> Story 
> As a data scientist, I want to perform *advanced* pivot operation on my data, 
> so that I can prepare it for input to predictive analytics algorithms.
> Details
> * Advanced pivot for this story means the features defined as MVP in the 
> requirements attached.
> In general, we are following Pandas ideas.
> References
> [1] Pivot table general information, like what is pivoting?
> https://en.wikipedia.org/wiki/Pivot_table
> [2] Pandas pivot tables and cross-tabulations
> http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
> http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-pivot
> http://pbpython.com/pandas-pivot-table-explained.html
> [3] GPDB pivot_sum function
> http://gpdb.docs.pivotal.io/4320/admin_guide/query.html#topic30
> [4] PostgreSQL tablefunc
> http://www.postgresql.org/docs/9.4/static/tablefunc.html
> [5] PDL tools pivoting routines
> http://pdl-tools.pa.pivotal.io/group__grp__pivot.html
> http://pdl-tools.pa.pivotal.io/group__grp__pivot01.html
> [6] Aster Pivot and Unpivot functions
> User Guide
> http://www.info.teradata.com/eDownload.cfm?itemid=122580002
> [7] PostgreSQL aggregates
> http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
> [8] PostgreSQL basic statements/assignment operator, 
> http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to