[jira] [Commented] (MADLIB-1050) Encoding of categorical variables limited to ~1600 colums?

Frank McQuillan (JIRA) Fri, 02 Dec 2016 12:16:45 -0800

    [ 
https://issues.apache.org/jira/browse/MADLIB-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716163#comment-15716163
 ]


Frank McQuillan commented on MADLIB-1050:
-----------------------------------------

Array output for encoding categorical variables is part of the work that is in 
progress now on
https://issues.apache.org/jira/browse/MADLIB-1038
which is slated for the next release v1.10, and should make what you are trying 
to do very easy.

Until that is available, you will need to keep within the PostgreSQL column 
limits.  One approach is to use a "stitching" approach: encode different 
categorical variables and do multiple runs, then combine the results into a 
single array at the end to feed into the regression model.

> Encoding of categorical variables limited to ~1600 colums? 
> -----------------------------------------------------------
>
>                 Key: MADLIB-1050
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1050
>             Project: Apache MADlib
>          Issue Type: Bug
>            Reporter: Maximilian Schleich
>
> Hello, 
> I am trying to use the dummy encoding for categorical variables and feed it 
> to a linear regression model. My dataset, however, has more than 1664 
> categories, so Postgres cannot store it in one table. Is there any other way 
> for encoding dummy variables that does not require the creation of a new 
> table, perhaps the function can be streamlined into the regression model? 
> Thank you for your help!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MADLIB-1050) Encoding of categorical variables limited to ~1600 colums?

Reply via email to