Dear sir,

I read the source code of Mahout, it seems hasn’t Matrix Factorization(MF) 
method.

MF is very useful in recommender system, especially when the dataset is very 
sparse. Even though SVD and QR decomposition methods are successfully applied 
to solve some practical problems, it suffers severe overfitting when applied to 
sparse matrices. 

In fact, Matrix Factorization is very different to Matrix Decomposition, 
although they look very similar. Simply put, Matrix Decomposition is used to 
reduce dimension, and the original matrix need to be dense; but Matrix 
Factorization is usually used to predict users' rating on some items when the 
original matrix is very sparse.

So, I want to implement some Matrix Factorization method for Mahout, such as 
Non-negative Matrix Factorization(NMF) and Probabilistic Matrix 
Factorization(PMF).

If my proposal sounds interesting, do we need establish a new git branch, or 
just fork and pull request?

And here is my elementary implementation of NMF in java,using multiplicative 
update rules. In the next step I want to implement PMF, optimize them and 
implement the distribution version.


package org.apache.mahout.math;

import java.util.Random;
import org.apache.mahout.math.function.DoubleFunction;
public class NMF{
    private Matrix w;
    private Matrix h;

    /**
     * @param v Matrix original
     * @param r model order
     * @param steps max steps before converge
     * @param errMax threshold of object function change rate
     */
    public NMF(Matrix v, int r, int steps, double errMax) {
        double oldObj, obj;
        int n = v.rowSize();
        int m = v.columnSize();

        w = new DenseMatrix(n, r).assign(RANDOMF);
        h = new DenseMatrix(r, m).assign(RANDOMF);

        for (int step = 0; step < steps; step++) {

            oldObj = calObjectFunction(v, w, h);

            //update h
            Matrix wtv = wt.times(v);
            Matrix wtwh = wt.times(wh);
            for (int i = 0; i < r; i++) {
                for (int j = 0; j < m; j++) {
                    if(wtwh.get(i, j) > 0)
                        h.set(i, j, h.get(i,j)*
                            (wtv.get(i,j)/wtwh.get(i,j)) );
                }
            }

            // update w
            Matrix vht = v.times(h.transpose());
            Matrix whht = w.times(h).times(h.transpose());
            for (int i = 0; i < n; i++) {
                for (int j = 0; j < r; j++) {
                    if(whht.get(i, j) > 0)
                        w.set(i, j, w.get(i,j)*
                            (vht.get(i,j)/whht.get(i,j)) );
                    
                }
            }


            obj = calObjectFunction(v, w, h);
            double erro = oldObj - obj;
            if(erro < errMax) break;

        }

    }

    /**
     * Calculate the value of object function
     * @param v Matrix original
     * @param w Matrix factor
     * @param h Matrix factor
     * @return Value of object function
     */
    static double calObjectFunction(Matrix v, Matrix w, Matrix h){
        Matrix wh = w.times(h);
        Matrix minus = v.minus(wh);
        double err = 0;
        for(int i=0; i<minus.rowSize(); i++)
            for(int j=0; j<minus.columnSize(); j++)
                err += minus.get(i, j) * minus.get(i, j);
        return err;
    }


    private static final DoubleFunction RANDOMF = new DoubleFunction() {
        @Override
        public double apply(double a) {
            Random random = new Random();
            return random.nextDouble();
        }
    };

    public Matrix getW() {
        return w;
    }

    public Matrix getH() {
        return h;
    }


}

Reply via email to