[ 
https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210530#comment-13210530
 ] 

[email protected] commented on MAHOUT-817:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3863/
-----------------------------------------------------------

(Updated 2012-02-17 20:38:49.925577)


Review request for mahout.


Changes
-------

commit cd4862738fb74f01114e0e4c2fee8a737a009c13
Author: Dmitriy Lyubimov <[email protected]>
Date:   Fri Feb 17 12:35:47 2012 -0800

    Getting rid of prototype code; styling round

:100644 100644 d61210f... ebf087d... M  
core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java
:100644 100644 254887a... d9c03cb... M  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
:100644 100644 959d491... 8be8df1... M  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java
:100644 000000 59bdedb... 0000000... D  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java
:100644 100644 d247af4... 59f64ba... M  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
:100644 100644 96fe5e1... 1127f6a... M  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
:100644 000000 09f05d1... 0000000... D  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
:100644 100644 915fce5... 4168e98... M  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
:100644 100644 885f5fa... 1346d71... M  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.j
:100644 100644 760c715... 280e10a... M  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTes
:100644 100644 7015283... 0e34568... M  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSe
:000000 100644 0000000... 5bb5706... A  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java
:100644 000000 503433f... 0000000... D  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java
:100644 100644 32342c1... d6605c1... M  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java


Summary
-------


2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 
'apache/trunk' into MAHOUT-817
1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817
458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 
'apache/trunk' into MAHOUT-817
3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes
2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, 
minor fixes
48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the 
median data.
4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary 
correctors s_q and s_b
b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes
849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment
6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction
e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. 
still need to work on B'-job, V-job and front-end pca corrections.
ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0
b9b33cf72af85ade16fcfbf4e13a036877489afb comments
9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard 
functions, unit tests pass but need to verify the 2G benchmark.
39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I
780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out 
redundant parameters
02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael


This addresses bug MAHOUT-817.
    https://issues.apache.org/jira/browse/MAHOUT-817


Diffs (updated)
-----

  core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/DatasetSplitter.java 
c9003ad 
  
core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/FactorizationEvaluator.java
 0c6e3f7 
  
core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJob.java
 7dc3b79 
  core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/RecommenderJob.java 
9ca0b16 
  core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java 
1feaa03 
  
core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/PreparePreferenceMatrixJob.java
 fbe8914 
  
core/src/main/java/org/apache/mahout/cf/taste/hadoop/pseudo/RecommenderJob.java 
02d1ba6 
  
core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java
 951c860 
  
core/src/main/java/org/apache/mahout/cf/taste/hadoop/slopeone/SlopeOneAverageDiffsJob.java
 57fa036 
  
core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModel.java
 11eb295 
  
core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousUserDataModel.java
 7f9cfd4 
  
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
 15da502 
  
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
 4da6426 
  core/src/main/java/org/apache/mahout/clustering/AbstractCluster.java 2ceb01b 
  core/src/main/java/org/apache/mahout/clustering/CIMapper.java 5f25f4f 
  core/src/main/java/org/apache/mahout/clustering/CIReducer.java 726363e 
  core/src/main/java/org/apache/mahout/clustering/Cluster.java 2f8d4dd 
  core/src/main/java/org/apache/mahout/clustering/ClusterIterator.java e39c71e 
  core/src/main/java/org/apache/mahout/clustering/ClusterWritable.java dba8c37 
  core/src/main/java/org/apache/mahout/clustering/ClusteringPolicy.java b07b649 
  core/src/main/java/org/apache/mahout/clustering/ClusteringPolicyWritable.java 
8c148a8 
  
core/src/main/java/org/apache/mahout/clustering/DirichletClusteringPolicy.java 
116973f 
  
core/src/main/java/org/apache/mahout/clustering/FuzzyKMeansClusteringPolicy.java
 6c39d94 
  core/src/main/java/org/apache/mahout/clustering/KMeansClusteringPolicy.java 
7b0d874 
  core/src/main/java/org/apache/mahout/clustering/Model.java 79dab30 
  
core/src/main/java/org/apache/mahout/clustering/WeightedPropertyVectorWritable.java
 92373eb 
  core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 
7147015 
  core/src/main/java/org/apache/mahout/clustering/canopy/CanopyMapper.java 
52fe865 
  core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java 
ca814f9 
  
core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationConfigKeys.java
 366ec3c 
  
core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
 49a9cfc 
  
core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
 09be170 
  
core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletCluster.java 
7293479 
  
core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
 3cf25bc 
  core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletState.java 
d19842f 
  
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansClusterer.java
 2d882b0 
  
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
 aa7389f 
  
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java
 5f6cb47 
  core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/SoftCluster.java 
52fd764 
  core/src/main/java/org/apache/mahout/clustering/kmeans/Cluster.java 
PRE-CREATION 
  
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java 
3cf41ec 
  core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java 
9471e74 
  core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansCombiner.java 
eb086d8 
  core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java 
1099206 
  core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansMapper.java 
0945dcb 
  core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java 
bb777a4 
  core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java 
1c84f87 
  core/src/main/java/org/apache/mahout/clustering/kmeans/Kluster.java 8b22709 
  
core/src/main/java/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java 
4a725e7 
  
core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopy.java 
28fc43b 
  
core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java
 a33f1ca 
  
core/src/main/java/org/apache/mahout/clustering/spectral/eigencuts/EigencutsDriver.java
 06e0549 
  
core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java
 82daa5b 
  
core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReader.java
 11c4d88 
  core/src/main/java/org/apache/mahout/common/AbstractJob.java 55040f6 
  
core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java
 868d82f 
  
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/PathFilters.java
 19f78b5 
  core/src/main/java/org/apache/mahout/graph/AdjacencyMatrixJob.java ae419f6 
  core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalk.java 
5727a77 
  
core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalkWithRestartJob.java
 fcf4549 
  core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 
3e0dd5e 
  core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java 
PRE-CREATION 
  core/src/main/java/org/apache/mahout/math/hadoop/MatrixMultiplicationJob.java 
e907a6d 
  core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java a046b41 
  
core/src/main/java/org/apache/mahout/math/hadoop/decomposer/DistributedLanczosSolver.java
 c81ef71 
  
core/src/main/java/org/apache/mahout/math/hadoop/decomposer/EigenVerificationJob.java
 2e152c4 
  
core/src/main/java/org/apache/mahout/math/hadoop/similarity/SeedVectorUtil.java 
4d63f46 
  
core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java
 ff517dc 
  
core/src/main/java/org/apache/mahout/math/hadoop/solver/DistributedConjugateGradientSolver.java
 eba6d2a 
  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
 c52fe2a 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 
0c3a996 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 
0fa8707 
  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java
 59bdedb 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 
703c420 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java 
d314186 
  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java 
PRE-CREATION 
  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
 98c8c59 
  
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java 
b1a8b56 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 
53f26f4 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java 
d58789e 
  core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java 
bd8c6b1 
  core/src/main/java/org/apache/mahout/math/stats/entropy/Entropy.java 4a8078e 
  
core/src/main/java/org/apache/mahout/vectorizer/collocations/llr/CollocDriver.java
 7a0c639 
  
core/src/test/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModelTest.java
 984ef6c 
  core/src/test/java/org/apache/mahout/clustering/TestClusterClassifier.java 
391bdf6 
  core/src/test/java/org/apache/mahout/clustering/TestClusterInterface.java 
d9f06ec 
  
core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 
0b70339 
  
core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
 8a5e1ea 
  
core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
 d87c3e3 
  core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java 
c996d97 
  
core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
 aa32112 
  core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java 
8dd9d41 
  core/src/test/java/org/apache/mahout/common/AbstractJobTest.java 4feae91 
  
core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 
0ef8622 
  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java
 PRE-CREATION 
  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
 59f79c5 
  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java
 beb0102 
  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java
 PRE-CREATION 
  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java
 503433f 
  
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java
 32342c1 
  
examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
 1781481 
  
examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java
 4d4836f 
  
examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java
 7faf92e 
  
examples/src/main/java/org/apache/mahout/clustering/display/DisplayDirichlet.java
 2edadf1 
  
examples/src/main/java/org/apache/mahout/clustering/display/DisplayFuzzyKMeans.java
 a5ef4d0 
  
examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java 
bc5c2ea 
  
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java
 3833932 
  
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/dirichlet/Job.java
 32b9efe 
  
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
 3ac3cca 
  
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
 d63ac9e 
  
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/meanshift/Job.java
 ef69827 
  integration/pom.xml b751b98 
  
integration/src/main/java/org/apache/mahout/classifier/ConfusionMatrixDumper.java
 5958ce8 
  integration/src/main/java/org/apache/mahout/utils/MatrixDumper.java b71cb95 
  integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java 
e108aa4 
  
integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java 
3bc72ab 
  integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java 
11769b1 
  integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 
5a9d0f2 
  integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 
716aaf9 
  
integration/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
 eef9551 
  pom.xml 7485994 

Diff: https://reviews.apache.org/r/3863/diff


Testing
-------

Additional unit tests for PCA


Thanks,

Dmitriy


                
> Add PCA options to SSVD code
> ----------------------------
>
>                 Key: MAHOUT-817
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-817
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.7
>
>         Attachments: MAHOUT-817.patch, MAHOUT-817.patch, MAHOUT-817.patch, 
> SSVD-PCA options.pdf, ssvd-tests.R, ssvd.R, ssvd.m
>
>
> It seems that a simple solution should exist to integrate PCA mean 
> subtraction into SSVD algorithm without making it a pre-requisite step and 
> also avoiding densifying the big input. 
> Several approaches were suggested:
> 1) subtract mean off B
> 2) propagate mean vector deeper into algorithm algebraically where the data 
> is already collapsed to smaller matrices
> 3) --?
> It needs some math done first . I'll take a stab at 1 and 2 but thoughts and 
> math are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to