I'm cool with an NMF/PMF addition; you can make pull requests for further comments.
> On Dec 22, 2014, at 10:00 PM, 梁明强 <[email protected]> wrote: > > Dear sir, > > I read the source code of Mahout, it seems hasn’t Matrix Factorization(MF) > method. > > MF is very useful in recommender system, especially when the dataset is very > sparse. Even though SVD and QR decomposition methods are successfully applied > to solve some practical problems, it suffers severe overfitting when applied > to sparse matrices. > > In fact, Matrix Factorization is very different to Matrix Decomposition, > although they look very similar. Simply put, Matrix Decomposition is used to > reduce dimension, and the original matrix need to be dense; but Matrix > Factorization is usually used to predict users' rating on some items when the > original matrix is very sparse. > > So, I want to implement some Matrix Factorization method for Mahout, such as > Non-negative Matrix Factorization(NMF) and Probabilistic Matrix > Factorization(PMF). > > If my proposal sounds interesting, do we need establish a new git branch, or > just fork and pull request? > > And here is my elementary implementation of NMF in java,using multiplicative > update rules. In the next step I want to implement PMF, optimize them and > implement the distribution version. > > > package org.apache.mahout.math; > > import java.util.Random; > import org.apache.mahout.math.function.DoubleFunction; > public class NMF{ > private Matrix w; > private Matrix h; > > /** > * @param v Matrix original > * @param r model order > * @param steps max steps before converge > * @param errMax threshold of object function change rate > */ > public NMF(Matrix v, int r, int steps, double errMax) { > double oldObj, obj; > int n = v.rowSize(); > int m = v.columnSize(); > > w = new DenseMatrix(n, r).assign(RANDOMF); > h = new DenseMatrix(r, m).assign(RANDOMF); > > for (int step = 0; step < steps; step++) { > > oldObj = calObjectFunction(v, w, h); > > //update h > Matrix wtv = wt.times(v); > Matrix wtwh = wt.times(wh); > for (int i = 0; i < r; i++) { > for (int j = 0; j < m; j++) { > if(wtwh.get(i, j) > 0) > h.set(i, j, h.get(i,j)* > (wtv.get(i,j)/wtwh.get(i,j)) ); > } > } > > // update w > Matrix vht = v.times(h.transpose()); > Matrix whht = w.times(h).times(h.transpose()); > for (int i = 0; i < n; i++) { > for (int j = 0; j < r; j++) { > if(whht.get(i, j) > 0) > w.set(i, j, w.get(i,j)* > (vht.get(i,j)/whht.get(i,j)) ); > > } > } > > > obj = calObjectFunction(v, w, h); > double erro = oldObj - obj; > if(erro < errMax) break; > > } > > } > > /** > * Calculate the value of object function > * @param v Matrix original > * @param w Matrix factor > * @param h Matrix factor > * @return Value of object function > */ > static double calObjectFunction(Matrix v, Matrix w, Matrix h){ > Matrix wh = w.times(h); > Matrix minus = v.minus(wh); > double err = 0; > for(int i=0; i<minus.rowSize(); i++) > for(int j=0; j<minus.columnSize(); j++) > err += minus.get(i, j) * minus.get(i, j); > return err; > } > > > private static final DoubleFunction RANDOMF = new DoubleFunction() { > @Override > public double apply(double a) { > Random random = new Random(); > return random.nextDouble(); > } > }; > > public Matrix getW() { > return w; > } > > public Matrix getH() { > return h; > } > > > } >
