[ 
https://issues.apache.org/jira/browse/MADLIB-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316963#comment-15316963
 ] 

Frank McQuillan edited comment on MADLIB-908 at 6/6/16 6:46 PM:
----------------------------------------------------------------

Here is the proposed MVP pivot interface.  It is similar to Pandas [2] with 
some naming changes.  Note that some of these features will be done in phase 2 
of the pivot https://issues.apache.org/jira/browse/MADLIB-1004

{code}
madlib.pivot (
   source_table,
   output_table,
   index,  
   pivot_cols,
   pivot_values, 
   aggregate_func,
   keep_null,
   fill_value
)
{code}

source_table
VARCHAR.  Name of the source table.  Can also be a view.

output_table
VARCHAR.  Name of the result table.

index
VARCHAR.  Single column or list of comma-separated columns that will form the 
index of the output pivot table.  By index we mean the values to group by in 
the rows of the output pivot table.

pivot_cols
VARCHAR.  Single column or list of comma-separated columns that will form the 
columns of the output pivot table.

pivot_values
VARCHAR.  A comma-separated list of columns that contain the values to be 
summarized in the output pivot table.  For more than one value, treat like 
Pandas and add a block off to the right for each new value.

aggregate_func
VARCHAR.  A comma-separated list of aggregates to be applied to values.  These 
can be PostgreSQL aggregates or UDAs.  Multiple aggregates can be specified, 
and will be applied separately to value columns.  Like Pandas, need to support 
having certain aggregates apply to certain value columns.

keep_null (optional)
BOOLEAN. default: FALSE.   If TRUE, then pivot columns are created 
corresponding to NULL categories.  If FALSE, then no pivot columns will be 
created for NULL categories.

fill_value (optional)
VARCHAR.  If specified, determines how to fill NULL values resulting from pivot 
operation.  For example, user may want to set NULLs to zero.  Otherwise these 
values will be left as determined by PostgreSQL.



was (Author: fmcquillan):
Here is the proposed MVP pivot interface.  It is similar to Pandas [2] with 
some naming changes:

{code}
madlib.pivot (
   source_table,
   output_table,
   index,  
   pivot_cols,
   pivot_values, 
   aggregate_func,
   keep_null,
   fill_value
)
{code}

source_table
VARCHAR.  Name of the source table.  Can also be a view.

output_table
VARCHAR.  Name of the result table.

index
VARCHAR.  Single column or list of comma-separated columns that will form the 
index of the output pivot table.  By index we mean the values to group by in 
the rows of the output pivot table.

pivot_cols
VARCHAR.  Single column or list of comma-separated columns that will form the 
columns of the output pivot table.

pivot_values
VARCHAR.  A comma-separated list of columns that contain the values to be 
summarized in the output pivot table.  For more than one value, treat like 
Pandas and add a block off to the right for each new value.

aggregate_func
VARCHAR.  A comma-separated list of aggregates to be applied to values.  These 
can be PostgreSQL aggregates or UDAs.  Multiple aggregates can be specified, 
and will be applied separately to value columns.  Like Pandas, need to support 
having certain aggregates apply to certain value columns.

keep_null (optional)
BOOLEAN. default: FALSE.   If TRUE, then pivot columns are created 
corresponding to NULL categories.  If FALSE, then no pivot columns will be 
created for NULL categories.

fill_value (optional)
VARCHAR.  If specified, determines how to fill NULL values resulting from pivot 
operation.  For example, user may want to set NULLs to zero.  Otherwise these 
values will be left as determined by PostgreSQL.


> Pivoting - Phase 1 (basic pivot)
> --------------------------------
>
>                 Key: MADLIB-908
>                 URL: https://issues.apache.org/jira/browse/MADLIB-908
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>
> Story 
> As a data scientist, I want to perform *basic* pivot operation on my data, so 
> that I can prepare it for input to predictive analytics algorithms.
> Details
> Basic pivot for this story means:
> * single index column
> * single pivot  column
> * single value column
> * single aggregate function = SUM
> * no fill value 
> PDL Tools [5] supports some of this functionality but its interface is 
> confusing.   Pandas [2], PostgreSQL [4] and Aster [6] also have some version 
> of pivoting that we can learn from. 
> In general, we are following Pandas ideas.
> References
> [1] Pivot table general information, like what is pivoting?
> https://en.wikipedia.org/wiki/Pivot_table
> [2] Pandas pivot tables and cross-tabulations
> http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
> http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-pivot
> http://pbpython.com/pandas-pivot-table-explained.html
> [3] GPDB pivot_sum function
> http://gpdb.docs.pivotal.io/4320/admin_guide/query.html#topic30
> [4] PostgreSQL tablefunc
> http://www.postgresql.org/docs/9.4/static/tablefunc.html
> [5] PDL tools pivoting routines
> http://pdl-tools.pa.pivotal.io/group__grp__pivot.html
> http://pdl-tools.pa.pivotal.io/group__grp__pivot01.html
> [6] Aster Pivot and Unpivot functions
> User Guide
> http://www.info.teradata.com/eDownload.cfm?itemid=122580002
> [7] PostgreSQL aggregates
> http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
> [8] PostgreSQL basic statements/assignment operator, 
> http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to