Hi,

I'm trying create_indicator_variables() to encode categorical variables.

https://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html

And I found that PostgreSQL had a limitation of maximum number of variables
in SELECT list (called target list in PostgreSQL), up to 1664.

You may see this error when you have more than 1664 categories in your variable.

spiexceptions.ProgramLimitExceeded: target lists can have at most 1664 entries

Now, I'm considering using PostgreSQL arrays to contain indicators instead of
allocating single column per category.

If create_indicator_variables() supports arrays as its output, it
allows us to deal
with categorical variables which have more than 1664 categories.
And of course, I would like to use the sparse vector for it to compress them.

https://madlib.incubator.apache.org/docs/latest/group__grp__svec.html

Seems good to you? Any comments?

Regards,
-- 
Satoshi Nagayasu <[email protected]>

Reply via email to