[
https://issues.apache.org/jira/browse/MADLIB-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316959#comment-15316959
]
Frank McQuillan edited comment on MADLIB-1004 at 6/17/16 6:53 PM:
------------------------------------------------------------------
Here is the proposed MVP pivot interface. It is similar to Pandas [2] with
some naming changes:
{code}
madlib.pivot (
source_table,
output_table,
index,
pivot_cols,
pivot_values,
aggregate_func,
keep_null,
fill_value
)
{code}
source_table
VARCHAR. Name of the source table. Can also be a view.
output_table
VARCHAR. Name of the result table.
index
VARCHAR. Single column or list of comma-separated columns that will form the
index of the output pivot table. By index we mean the values to group by in
the rows of the output pivot table.
pivot_cols
VARCHAR. Single column or list of comma-separated columns that will form the
columns of the output pivot table.
pivot_values
VARCHAR. A comma-separated list of columns that contain the values to be
summarized in the output pivot table. For more than one value, treat like
Pandas and add a block off to the right for each new value.
aggregate_func
VARCHAR. A comma-separated list of aggregates to be applied to values. These
can be PostgreSQL aggregates or UDAs. List of aggregates applied to lists of
values to be supported. See Pandas for example.
keep_null (optional)
BOOLEAN. default: FALSE. If TRUE, then pivot columns are created
corresponding to NULL categories. If FALSE, then no pivot columns will be
created for NULL categories.
fill_value (optional)
VARCHAR. If specified, determines how to fill NULL values resulting from pivot
operation. For example, user may want to set NULLs to zero. This is a global
parameter (not applied per aggregate) and is applied post-aggregation to the
output table. Otherwise these values will be left as determined by PostgreSQL.
was (Author: fmcquillan):
Here is the proposed MVP pivot interface. It is similar to Pandas [2] with
some naming changes:
{code}
madlib.pivot (
source_table,
output_table,
index,
pivot_cols,
pivot_values,
aggregate_func,
keep_null,
fill_value
)
{code}
source_table
VARCHAR. Name of the source table. Can also be a view.
output_table
VARCHAR. Name of the result table.
index
VARCHAR. Single column or list of comma-separated columns that will form the
index of the output pivot table. By index we mean the values to group by in
the rows of the output pivot table.
pivot_cols
VARCHAR. Single column or list of comma-separated columns that will form the
columns of the output pivot table.
pivot_values
VARCHAR. A comma-separated list of columns that contain the values to be
summarized in the output pivot table. For more than one value, treat like
Pandas and add a block off to the right for each new value.
aggregate_func
VARCHAR. A comma-separated list of aggregates to be applied to values. These
can be PostgreSQL aggregates or UDAs. Multiple aggregates can be specified,
and will be applied separately to value columns. Like Pandas, need to support
having certain aggregates apply to certain value columns.
keep_null (optional)
BOOLEAN. default: FALSE. If TRUE, then pivot columns are created
corresponding to NULL categories. If FALSE, then no pivot columns will be
created for NULL categories.
fill_value (optional)
VARCHAR. If specified, determines how to fill NULL values resulting from pivot
operation. For example, user may want to set NULLs to zero. Otherwise these
values will be left as determined by PostgreSQL.
> Pivoting - Phase 2 (advanced pivot)
> -----------------------------------
>
> Key: MADLIB-1004
> URL: https://issues.apache.org/jira/browse/MADLIB-1004
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform *advanced* pivot operation on my data,
> so that I can prepare it for input to predictive analytics algorithms.
> Details
> * Advanced pivot for this story means the features defined as MVP in the
> requirements attached.
> In general, we are following Pandas ideas.
> References
> [1] Pivot table general information, like what is pivoting?
> https://en.wikipedia.org/wiki/Pivot_table
> [2] Pandas pivot tables and cross-tabulations
> http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
> http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-pivot
> http://pbpython.com/pandas-pivot-table-explained.html
> [3] GPDB pivot_sum function
> http://gpdb.docs.pivotal.io/4320/admin_guide/query.html#topic30
> [4] PostgreSQL tablefunc
> http://www.postgresql.org/docs/9.4/static/tablefunc.html
> [5] PDL tools pivoting routines
> http://pdl-tools.pa.pivotal.io/group__grp__pivot.html
> http://pdl-tools.pa.pivotal.io/group__grp__pivot01.html
> [6] Aster Pivot and Unpivot functions
> User Guide
> http://www.info.teradata.com/eDownload.cfm?itemid=122580002
> [7] PostgreSQL aggregates
> http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
> [8] PostgreSQL basic statements/assignment operator,
> http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)