Hi, I'm trying create_indicator_variables() to encode categorical variables.
https://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html And I found that PostgreSQL had a limitation of maximum number of variables in SELECT list (called target list in PostgreSQL), up to 1664. You may see this error when you have more than 1664 categories in your variable. spiexceptions.ProgramLimitExceeded: target lists can have at most 1664 entries Now, I'm considering using PostgreSQL arrays to contain indicators instead of allocating single column per category. If create_indicator_variables() supports arrays as its output, it allows us to deal with categorical variables which have more than 1664 categories. And of course, I would like to use the sparse vector for it to compress them. https://madlib.incubator.apache.org/docs/latest/group__grp__svec.html Seems good to you? Any comments? Regards, -- Satoshi Nagayasu <[email protected]>
