[ 
https://issues.apache.org/jira/browse/MADLIB-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Iyer updated MADLIB-1013:
-------------------------------
         Assignee: Rahul Iyer
    Fix Version/s: v1.9.2

> Add array output to create_indicator_variables
> ----------------------------------------------
>
>                 Key: MADLIB-1013
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1013
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Utilities
>            Reporter: Rahul Iyer
>            Assignee: Rahul Iyer
>             Fix For: v1.9.2
>
>
> Feature request from Satoshi Nagayasu <[email protected]>
> ---------------------------------------------------------------------------------------
> I'm trying create_indicator_variables() to encode categorical variables.
> https://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html
> And I found that PostgreSQL had a limitation of maximum number of variables
> in SELECT list (called target list in PostgreSQL), up to 1664.
> You may see this error when you have more than 1664 categories in your 
> variable.
> spiexceptions.ProgramLimitExceeded: target lists can have at most 1664 entries
> Now, I'm considering using PostgreSQL arrays to contain indicators instead of
> allocating single column per category.
> If create_indicator_variables() supports arrays as its output, it
> allows us to deal with categorical variables which have more than 1664 
> categories. And of course, I would like to use the sparse vector for it to 
> compress them.
> https://madlib.incubator.apache.org/docs/latest/group__grp__svec.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to