[
https://issues.apache.org/jira/browse/MADLIB-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan resolved MADLIB-1066.
-------------------------------------
Resolution: Fixed
> Pivoting - support array and svec output
> ----------------------------------------
>
> Key: MADLIB-1066
> URL: https://issues.apache.org/jira/browse/MADLIB-1066
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.11
>
>
> Background
> Follow on to these JIRAs
> https://issues.apache.org/jira/browse/MADLIB-908
> https://issues.apache.org/jira/browse/MADLIB-1004
> this capability is to carry over some good ideas from
> https://issues.apache.org/jira/browse/MADLIB-1038
> Story
> Support array output format to allow > 1600 output columns (or PostgreSQL
> limit). i.e., many MADlib algos take array input so pivot should support
> array output. Base this on how it is done in encoding categorical variables
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> Add 'output_type' to interface:
> {code}
> pivot(
> source_table,
> output_table,
> index,
> pivot_cols,
> pivot_values,
> aggregate_func,
> fill_value,
> keep_null,
> output_col_dictionary,
> output_type -- New
> )
> {code}
> where
> {code}
> output_type (optional)
> VARCHAR. default: 'column'. This parameter controls the output format. If
> 'column', a column is created for each output variable. PostgreSQL limits the
> number of columns in a table. If the total number of columns exceeds the
> limit, then make this parameter either 'array' to combine the indicator
> columns into an array or 'svec' to cast the array output to 'madlib.svec'
> type.
> Since the array output for any single tuple would be sparse, the 'svec'
> output would be most efficient for storage. The 'array' output is useful if
> the array is used for post-processing, including concatenating with other
> non-categorical features.
> A dictionary will be created when 'output_type' is 'array' or 'svec' to
> define an index into the array. The dictionary table will be given the name
> of the 'output_table' appended by '_dictionary'.
> {code}
> See code in
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> need to support NULL (=default 'column'). Also 'a' and 'Array' and 'arr'
> should be interpreted as 'array. Same idea with 'column' and 'svec'
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)