[ 
https://issues.apache.org/jira/browse/MAHOUT-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701672#comment-13701672
 ] 

Peng Cheng commented on MAHOUT-1272:
------------------------------------

Hey honoured contributors I've got some crude test results for the new parallel 
SGD factorizer for CF:

1. parameters:
    lambda = 1e-10
    rank of the rating matrix/number of features of each user/item vectors = 50
    number of biases: 3 (average rating + user bias + item bias)
    number of iterations/epochs = 2 (for all factorizers including ALSWR, 
ratingSGD and the proposed parallelSGD)
    initial mu/learning rate = 0.01 (for ratingSGD and proposed parallelSGD)
    decay rate of mu = 1 (does not decay) (for ratingSGD and proposed 
parallelSGD)
    other parameters are set to default.

2. result on movielens-10m (I don't know what the hell happened to ALSWR, the 
default hyperparameters must screw up real bad, but my point is the speed edge):
  a. RMSE

Jul 07, 2013 5:20:23 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ALSWRFactorizer: 3.7709163950800665E21 
time spent: 6.179s===================
Jul 07, 2013 5:20:23 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With RatingSGDFactorizer: 
0.8847393972529887 time spent: 6.179s===================
Jul 07, 2013 5:20:23 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ParallelSGDFactorizer: 
0.8805947464818478 time spent: 3.084s====================

  b. Absolute Average

INFO: ==================Recommender With ALSWRFactorizer: 1.2085420449917682E19 
time spent: 7.444s===================
Jul 07, 2013 5:22:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With RatingSGDFactorizer: 
0.6757777685274206 time spent: 7.444s===================
Jul 07, 2013 5:22:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ParallelSGDFactorizer: 
0.6775774766740665 time spent: 2.365s====================

3. result on movielens-1m (in average sgd works worse on it comparing to 
movielens-10m, perhaps I could use more iterations/epochs)

  a. RMSE

Jul 07, 2013 5:26:04 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ALSWRFactorizer: 1.3514189134383086E20 
time spent: 0.637s===================
Jul 07, 2013 5:26:04 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With RatingSGDFactorizer: 
0.9312989913558529 time spent: 0.637s===================
Jul 07, 2013 5:26:04 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ParallelSGDFactorizer: 
0.9529995632658007 time spent: 0.305s====================

  b. Absolute Average

Jul 07, 2013 5:25:29 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ALSWRFactorizer: 
1.58934499216789965E18 time spent: 0.626s===================
Jul 07, 2013 5:25:29 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With RatingSGDFactorizer: 
0.7459565635961599 time spent: 0.626s===================
Jul 07, 2013 5:25:29 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: ==================Recommender With ParallelSGDFactorizer: 
0.7420818642753416 time spent: 0.297s====================

Great thanks to Sebastian for his guidance, I'll upload the EvaluatorRunner 
class as a mahout-example component and the formatted code shortly.
                
> Parallel SGD matrix factorizer for SVDrecommender
> -------------------------------------------------
>
>                 Key: MAHOUT-1272
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1272
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: features, patch, test
>         Attachments: mahout.patch, ParallelSGDFactorizer.java, 
> ParallelSGDFactorizerTest.java
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> a parallel factorizer based on MAHOUT-1089 may achieve better performance on 
> multicore processor.
> existing code is single-thread and perhaps may still be outperformed by the 
> default ALS-WR.
> In addition, its hardcoded online-to-batch-conversion prevents it to be used 
> by an online recommender. An online SGD implementation may help build 
> high-performance online recommender as a replacement of the outdated 
> slope-one.
> The new factorizer can implement either DSGD 
> (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or 
> hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).
> Related discussion has been carried on for a while but remain inconclusive:
> http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to