[ 
https://issues.apache.org/jira/browse/MAHOUT-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707830#comment-13707830
 ] 

Peng Cheng edited comment on MAHOUT-1272 at 7/13/13 8:57 PM:
-------------------------------------------------------------

Test on libimseti dataset (http://www.occamslab.com/petricek/data/), libimseti 
is a czech dating website.
This dataset has been used in a live example described in book 'Mahout in 
Action', page 71, written by a few guys hanging around this site.

parameters:
  private final static double lambda = 0.1;
  private final static int rank = 16;
  
  private static int numALSIterations=5;
  private static int numEpochs=20;

(for ratingSGD)
      double randomNoise=0.02;
      double learningRate=0.01;
      double learningDecayRate=1;

(for parallelSGD)
      double mu0=1;
      double decayFactor=1;
      int stepOffset=100;
      double forgettingExponent=-1;

result (using average absolute difference, the rating is based on a 1-10 scale):

INFO: ==================Recommender With ALSWRFactorizer: 1.5623366369454739 
time spent: 41.24s=================== (should be noted the number of ALS 
iteration is much smaller than others, which leads to suboptimal result, but 
this is not the point of this test)
Jul 13, 2013 4:39:34 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With RatingSGDFactorizer: 1.28022379922957 
time spent: 118.188s===================
Jul 13, 2013 4:39:34 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ParallelSGDFactorizer: 
1.2798905733917445 time spent: 21.806s====================

This is already the best result I can get, the original book claims a best 
result of 1.12 on this dataset, which I never achieve. If you have also 
experimented and find a better parameter set, please post here.
                
      was (Author: peng):
    Test on libimseti dataset (http://www.occamslab.com/petricek/data/), 
libimseti is a czech dating website.
This dataset has been used in a live example described in book 'Mahout in 
Action', page 71, written by a few guys hanging around this site.

parameters:
  private final static double lambda = 0.1;
  private final static int rank = 16;
  
  private static int numALSIterations=5;
  private static int numEpochs=20;

      double randomNoise=0.02;
      double learningRate=0.01;
      double learningDecayRate=1;

result (using average absolute difference, the rating is based on a 1-10 scale):

INFO: ==================Recommender With ALSWRFactorizer: 1.5623366369454739 
time spent: 41.24s=================== (should be noted the number of ALS 
iteration is much smaller than others, which leads to suboptimal result, but 
this is not the point of this test)
Jul 13, 2013 4:39:34 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With RatingSGDFactorizer: 1.28022379922957 
time spent: 118.188s===================
Jul 13, 2013 4:39:34 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ParallelSGDFactorizer: 
1.2798905733917445 time spent: 21.806s====================

This is already the best result I can get, the original book claims a best 
result of 1.12 on this dataset, which I never achieve. If you have also 
experimented and find a better parameter set, please post here.

                  
> Parallel SGD matrix factorizer for SVDrecommender
> -------------------------------------------------
>
>                 Key: MAHOUT-1272
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1272
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: features, patch, test
>             Fix For: 0.8
>
>         Attachments: GroupLensSVDRecomenderEvaluatorRunner.java, 
> libimsetiSVDRecomenderEvaluatorRunner.java, mahout.patch, 
> ParallelSGDFactorizer.java, ParallelSGDFactorizer.java, 
> ParallelSGDFactorizerTest.java, ParallelSGDFactorizerTest.java
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> a parallel factorizer based on MAHOUT-1089 may achieve better performance on 
> multicore processor.
> existing code is single-thread and perhaps may still be outperformed by the 
> default ALS-WR.
> In addition, its hardcoded online-to-batch-conversion prevents it to be used 
> by an online recommender. An online SGD implementation may help build 
> high-performance online recommender as a replacement of the outdated 
> slope-one.
> The new factorizer can implement either DSGD 
> (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or 
> hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).
> Related discussion has been carried on for a while but remain inconclusive:
> http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to