[ 
https://issues.apache.org/jira/browse/MADLIB-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926625#comment-15926625
 ] 

ASF GitHub Bot commented on MADLIB-1066:
----------------------------------------

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/incubator-madlib/pull/108

    Pivot: Add support for array output

    JIRA: MADLIB-1066
    
    When total pivoted columns exceed 1600, an array output becomes
    essential. This commit adds support to get each pivoted set of columns
    (all columns related to a particular value-aggregate combination) as an
    array. There is also support for getting the output as madlib.svec.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iyerr3/incubator-madlib 
feature/pivot_array_support

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #108
    
----
commit fe579d1300f0c30eeb72e7d8b411af9cdffe2c59
Author: Rahul Iyer <[email protected]>
Date:   2017-03-11T00:45:03Z

    Pivot: Add support for array output
    
    JIRA: MADLIB-1066
    
    When total pivoted columns exceed 1600, an array output becomes
    essential. This commit adds support to get each pivoted set of columns
    (all columns related to a particular value-aggregate combination) as an
    array. There is also support for getting the output as madlib.svec.

----


> Pivoting - support array and svec output
> ----------------------------------------
>
>                 Key: MADLIB-1066
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1066
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.11
>
>
> Background
> Follow on to these JIRAs
> https://issues.apache.org/jira/browse/MADLIB-908
> https://issues.apache.org/jira/browse/MADLIB-1004
> this capability is to carry over some good ideas from
> https://issues.apache.org/jira/browse/MADLIB-1038
> Story
> Support array output format to allow > 1600 output columns (or PostgreSQL 
> limit).  i.e., many MADlib algos take array input so pivot should support 
> array output.  Base this on how it is done in encoding categorical variables 
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> Add 'output_type' to interface:
> {code}
> pivot(
>     source_table,
>     output_table,
>     index,
>     pivot_cols,
>     pivot_values,
>     aggregate_func,
>     fill_value,
>     keep_null,
>     output_col_dictionary,
>     output_type                          -- New
>     )
> {code}
> where
> {code}
> output_type (optional)
> VARCHAR. default: 'column'. This parameter controls the output format.  If 
> 'column', a column is created for each output variable. PostgreSQL limits the 
> number of columns in a table. If the total number of columns exceeds the 
> limit, then make this parameter either 'array' to combine the indicator 
> columns into an array or 'svec' to cast the array output to 'madlib.svec' 
> type.
> Since the array output for any single tuple would be sparse, the 'svec' 
> output would be most efficient for storage. The 'array' output is useful if 
> the array is used for post-processing, including concatenating with other 
> non-categorical features.
> A dictionary will be created when 'output_type' is 'array' or 'svec' to 
> define an index into the array. The dictionary table will be given the name 
> of the 'output_table' appended by '_dictionary'.
> {code}
> See code in
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> need to support NULL (=default 'column').  Also 'a' and 'Array' and 'arr' 
> should be interpreted as 'array.  Same idea with 'column' and 'svec'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to