Re: Factorization Machines in MADlib

Frank McQuillan Sat, 16 Apr 2016 11:58:11 -0700

Thanks Gautam for your email regarding factorization machines.  This is a
really interesting approach to address some of the issues that SVM has with
sparse tuple interactions.  For folks interesting in building recommender
systems in Greenplum or HDB/HAWQ, FMs could be a key building block.


So I think this would be a great addition to MADlib, and I would be happy
to help out in any way to get it into the MADlib core project, e.g.,
contribute Pivotal resources for scale testing on very large data sets.

I know Xiaocheng Teng has experience in this domain, so perhaps he would
like to help marshal this along?

Frank

On Thu, Apr 7, 2016 at 10:45 AM, Gautam Muralidhar <[email protected]>
wrote:

> Hi MADlib Developers,
>
> I have implemented Steffen Rendle's Factorization Machines
> <http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf> on Python and
> MPP. You can find a pure python implementation and it's MPP variant for
> Greenplum and HAWQ here:
>
> 1. Factorization Machines - Pure Python
> <
> https://github.com/gautamsm/data-science-on-mpp/blob/master/ml_algorithms/factorization_machines.ipynb
> >
>
> 2. Factorization Machines - MPP
> <
> https://github.com/gautamsm/data-science-on-mpp/blob/master/ml_algorithms/factorization_machines_mpp.ipynb
> >
>
>
> *A quick summary of factorization machines:*
>
> Typically, when building a recommender system, you not only have tuple
> interactions (user-item, user-offer, user- TV show, user-purchase, etc.),
> but also side information in the form of user and item features.
> Historically, algorithms such as SVM and dense kernel methods have not been
> good at estimating parameters for recommenders owing to the sparsity of the
> tuple interactions. Bayesian collaborative filtering approaches have had
> success, but incorporation of side information entails fairly complex
> hierarchical modeling resulting in MCMC or Gibbs sampling for inference of
> model parameters.
>
> Rendle's work is an extremely interesting and novel way of thinking about
> this problem and seamlessly blending tuple interactions with side
> information. Since tuple interactions are modeled via a low rank matrix
> decomposition, the quality of parameter estimation will not suffer from
> sparsity unlike SVMs. I found the approach really elegant and further
> research and few great posts on Quora (one by a leading name
> <https://www.quora.com/What-are-some-of-the-best-recommendation-algorithms
> >,
> Xavier Amatrian) convinced me about the merit of this work.
>
>
> A couple of key points about factorization machines:
>
> 1. They can be seen as a generalization of SVMs and SVD++
>
> 2. *They can be applied to any real valued feature vector for regression or
> classification problems (think in place of linear/logistic regression, SVMs
> etc.)*
>
> *A quick note on my MPP implementation:*
>
> Function interface:
>
> fm_fit (
>
>         src_table varchar,
>
>         id_col_name varchar,
>
>         x_col_name varchar,
>
>         y_col_name varchar,
>
>         parameters varchar,
>
>         out_table varchar
>
> )
>
> Parameters of the fm_fit function:
>
> source_table - table containing the feature vectors and dependent variable
> as two separate columns
>
> id_col_name - an id column for each row
>
> x_col_name - features
>
> y_col_name - dependent variable
>
> parameters - learning parameters: regularizers (lambda0, lambda1, lambda2),
> number of iterations (n_iter), learning rate (learning_rate), factorization
> hyper-parameter (k), random seed (integer for batch-sgd), optim (batch
> or batch-std), and out_table - name of the table where the model will be
> stored
>
> 1. The MPP implementation has been currently implemented for the regression
> problem (squared loss function). I plan to implement the classification
> problem soon (hinge/log loss).
>
> 2. The implementation assumes your training data is randomly distributed.
> Two gradient descent variants are provided: mini-batch SGD and regular
> batch gradient descent. The mini-batch SGD implementation is very closely
> based on this paper by Smola et al.
> <http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf>
>
> 3.  Lastly, the order complexity of factorization machines is roughly
> O(TKN) where T is number of SGD iterations, K is the number of latent
> factors, and N is the feature dimension. The implementation involves a
> couple of loops for estimating the parameters of the tuple interactions,
> which I couldn't vectorize (I could, but the vectorization meant building
> 3-D matrices, which was even slower).  Consequently, an implementation of
> the SGD in C or C++ (as in MADlib) will be much faster than the PL/Python
> approach.
>
> Many thanks to Xiaocheng Teng (MADlib committer) for discussion on parallel
> SGD algrotihms.
>
> Does the community feel this would be a good addition to MADlib?
>
> Regards,
>
> Gautam
>
> --
> ------------------------------------------------------------------
> Gautam S. Muralidhar, Ph.D.
> Sr. Data Scientist, Pivotal Software, Inc.
> ------------------------------------------------------------------
>

Re: Factorization Machines in MADlib

Reply via email to