Thanks Gautam for your email regarding factorization machines. This is a really interesting approach to address some of the issues that SVM has with sparse tuple interactions. For folks interesting in building recommender systems in Greenplum or HDB/HAWQ, FMs could be a key building block.
So I think this would be a great addition to MADlib, and I would be happy to help out in any way to get it into the MADlib core project, e.g., contribute Pivotal resources for scale testing on very large data sets. I know Xiaocheng Teng has experience in this domain, so perhaps he would like to help marshal this along? Frank On Thu, Apr 7, 2016 at 10:45 AM, Gautam Muralidhar <[email protected]> wrote: > Hi MADlib Developers, > > I have implemented Steffen Rendle's Factorization Machines > <http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf> on Python and > MPP. You can find a pure python implementation and it's MPP variant for > Greenplum and HAWQ here: > > 1. Factorization Machines - Pure Python > < > https://github.com/gautamsm/data-science-on-mpp/blob/master/ml_algorithms/factorization_machines.ipynb > > > > 2. Factorization Machines - MPP > < > https://github.com/gautamsm/data-science-on-mpp/blob/master/ml_algorithms/factorization_machines_mpp.ipynb > > > > > *A quick summary of factorization machines:* > > Typically, when building a recommender system, you not only have tuple > interactions (user-item, user-offer, user- TV show, user-purchase, etc.), > but also side information in the form of user and item features. > Historically, algorithms such as SVM and dense kernel methods have not been > good at estimating parameters for recommenders owing to the sparsity of the > tuple interactions. Bayesian collaborative filtering approaches have had > success, but incorporation of side information entails fairly complex > hierarchical modeling resulting in MCMC or Gibbs sampling for inference of > model parameters. > > Rendle's work is an extremely interesting and novel way of thinking about > this problem and seamlessly blending tuple interactions with side > information. Since tuple interactions are modeled via a low rank matrix > decomposition, the quality of parameter estimation will not suffer from > sparsity unlike SVMs. I found the approach really elegant and further > research and few great posts on Quora (one by a leading name > <https://www.quora.com/What-are-some-of-the-best-recommendation-algorithms > >, > Xavier Amatrian) convinced me about the merit of this work. > > > A couple of key points about factorization machines: > > 1. They can be seen as a generalization of SVMs and SVD++ > > 2. *They can be applied to any real valued feature vector for regression or > classification problems (think in place of linear/logistic regression, SVMs > etc.)* > > *A quick note on my MPP implementation:* > > Function interface: > > fm_fit ( > > src_table varchar, > > id_col_name varchar, > > x_col_name varchar, > > y_col_name varchar, > > parameters varchar, > > out_table varchar > > ) > > Parameters of the fm_fit function: > > source_table - table containing the feature vectors and dependent variable > as two separate columns > > id_col_name - an id column for each row > > x_col_name - features > > y_col_name - dependent variable > > parameters - learning parameters: regularizers (lambda0, lambda1, lambda2), > number of iterations (n_iter), learning rate (learning_rate), factorization > hyper-parameter (k), random seed (integer for batch-sgd), optim (batch > or batch-std), and out_table - name of the table where the model will be > stored > > 1. The MPP implementation has been currently implemented for the regression > problem (squared loss function). I plan to implement the classification > problem soon (hinge/log loss). > > 2. The implementation assumes your training data is randomly distributed. > Two gradient descent variants are provided: mini-batch SGD and regular > batch gradient descent. The mini-batch SGD implementation is very closely > based on this paper by Smola et al. > <http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf> > > 3. Lastly, the order complexity of factorization machines is roughly > O(TKN) where T is number of SGD iterations, K is the number of latent > factors, and N is the feature dimension. The implementation involves a > couple of loops for estimating the parameters of the tuple interactions, > which I couldn't vectorize (I could, but the vectorization meant building > 3-D matrices, which was even slower). Consequently, an implementation of > the SGD in C or C++ (as in MADlib) will be much faster than the PL/Python > approach. > > Many thanks to Xiaocheng Teng (MADlib committer) for discussion on parallel > SGD algrotihms. > > Does the community feel this would be a good addition to MADlib? > > Regards, > > Gautam > > -- > ------------------------------------------------------------------ > Gautam S. Muralidhar, Ph.D. > Sr. Data Scientist, Pivotal Software, Inc. > ------------------------------------------------------------------ >
