Thanks, Satoshi. Created feature request (MADLIB-1013 <https://issues.apache.org/jira/browse/MADLIB-1013>), this will probably go into 1.9.2 since we've already started the release process for 1.9.1.
On Mon, Aug 8, 2016 at 7:21 PM, Satoshi Nagayasu <sn...@uptime.jp> wrote: > Hi Rahul, > > 2016-08-09 2:05 GMT+09:00 Rahul Iyer <ri...@pivotal.io>: > > Array output for *create_indicator_variables* would be quite helpful when > > number of categories is large and the svec representation would be ideal > > for it. There might be similar implications for *pivoting*, but we can > keep > > that as future discussion. > > Sounds great. > > > I'm curious about how you're using the indicator variables - svec is not > > widely supported in MADlib (yet) and might not give much benefit after > the > > encoding is complete. > > I'm trying to implement some recommendation or similarity search stuff > for several media items (movies, books, documents, else) with its metadata. > It has several categorical variables, such as authors, publishers, > actors/actresses, genres, else. Some of them have many categories. > > BTW, I'm a starter of data-mining and machine-learning, not having much > experience. > > Of course, I can reduce number of those categories, but playing with raw > data would be more fun. :) > > Regards, > -- > Satoshi Nagayasu <sn...@uptime.jp> > -- --------------------------------------------------------- Rahul Iyer Principal software engineer | Predictive Analytics *Pivotal**A new platform for a new era*