[
https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210530#comment-13210530
]
[email protected] commented on MAHOUT-817:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3863/
-----------------------------------------------------------
(Updated 2012-02-17 20:38:49.925577)
Review request for mahout.
Changes
-------
commit cd4862738fb74f01114e0e4c2fee8a737a009c13
Author: Dmitriy Lyubimov <[email protected]>
Date: Fri Feb 17 12:35:47 2012 -0800
Getting rid of prototype code; styling round
:100644 100644 d61210f... ebf087d... M
core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java
:100644 100644 254887a... d9c03cb... M
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
:100644 100644 959d491... 8be8df1... M
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java
:100644 000000 59bdedb... 0000000... D
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java
:100644 100644 d247af4... 59f64ba... M
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
:100644 100644 96fe5e1... 1127f6a... M
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
:100644 000000 09f05d1... 0000000... D
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
:100644 100644 915fce5... 4168e98... M
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
:100644 100644 885f5fa... 1346d71... M
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.j
:100644 100644 760c715... 280e10a... M
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTes
:100644 100644 7015283... 0e34568... M
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSe
:000000 100644 0000000... 5bb5706... A
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java
:100644 000000 503433f... 0000000... D
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java
:100644 100644 32342c1... d6605c1... M
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java
Summary
-------
2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch
'apache/trunk' into MAHOUT-817
1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817
458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch
'apache/trunk' into MAHOUT-817
3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes
2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options,
minor fixes
48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the
median data.
4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary
correctors s_q and s_b
b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes
849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment
6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction
e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs.
still need to work on B'-job, V-job and front-end pca corrections.
ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0
b9b33cf72af85ade16fcfbf4e13a036877489afb comments
9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard
functions, unit tests pass but need to verify the 2G benchmark.
39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I
780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out
redundant parameters
02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael
This addresses bug MAHOUT-817.
https://issues.apache.org/jira/browse/MAHOUT-817
Diffs (updated)
-----
core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/DatasetSplitter.java
c9003ad
core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/FactorizationEvaluator.java
0c6e3f7
core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJob.java
7dc3b79
core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/RecommenderJob.java
9ca0b16
core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
1feaa03
core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/PreparePreferenceMatrixJob.java
fbe8914
core/src/main/java/org/apache/mahout/cf/taste/hadoop/pseudo/RecommenderJob.java
02d1ba6
core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java
951c860
core/src/main/java/org/apache/mahout/cf/taste/hadoop/slopeone/SlopeOneAverageDiffsJob.java
57fa036
core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModel.java
11eb295
core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousUserDataModel.java
7f9cfd4
core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java
15da502
core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
4da6426
core/src/main/java/org/apache/mahout/clustering/AbstractCluster.java 2ceb01b
core/src/main/java/org/apache/mahout/clustering/CIMapper.java 5f25f4f
core/src/main/java/org/apache/mahout/clustering/CIReducer.java 726363e
core/src/main/java/org/apache/mahout/clustering/Cluster.java 2f8d4dd
core/src/main/java/org/apache/mahout/clustering/ClusterIterator.java e39c71e
core/src/main/java/org/apache/mahout/clustering/ClusterWritable.java dba8c37
core/src/main/java/org/apache/mahout/clustering/ClusteringPolicy.java b07b649
core/src/main/java/org/apache/mahout/clustering/ClusteringPolicyWritable.java
8c148a8
core/src/main/java/org/apache/mahout/clustering/DirichletClusteringPolicy.java
116973f
core/src/main/java/org/apache/mahout/clustering/FuzzyKMeansClusteringPolicy.java
6c39d94
core/src/main/java/org/apache/mahout/clustering/KMeansClusteringPolicy.java
7b0d874
core/src/main/java/org/apache/mahout/clustering/Model.java 79dab30
core/src/main/java/org/apache/mahout/clustering/WeightedPropertyVectorWritable.java
92373eb
core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java
7147015
core/src/main/java/org/apache/mahout/clustering/canopy/CanopyMapper.java
52fe865
core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java
ca814f9
core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationConfigKeys.java
366ec3c
core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
49a9cfc
core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
09be170
core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletCluster.java
7293479
core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
3cf25bc
core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletState.java
d19842f
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansClusterer.java
2d882b0
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
aa7389f
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java
5f6cb47
core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/SoftCluster.java
52fd764
core/src/main/java/org/apache/mahout/clustering/kmeans/Cluster.java
PRE-CREATION
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java
3cf41ec
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java
9471e74
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansCombiner.java
eb086d8
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
1099206
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansMapper.java
0945dcb
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java
bb777a4
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java
1c84f87
core/src/main/java/org/apache/mahout/clustering/kmeans/Kluster.java 8b22709
core/src/main/java/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java
4a725e7
core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopy.java
28fc43b
core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java
a33f1ca
core/src/main/java/org/apache/mahout/clustering/spectral/eigencuts/EigencutsDriver.java
06e0549
core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java
82daa5b
core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReader.java
11c4d88
core/src/main/java/org/apache/mahout/common/AbstractJob.java 55040f6
core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java
868d82f
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/PathFilters.java
19f78b5
core/src/main/java/org/apache/mahout/graph/AdjacencyMatrixJob.java ae419f6
core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalk.java
5727a77
core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalkWithRestartJob.java
fcf4549
core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java
3e0dd5e
core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java
PRE-CREATION
core/src/main/java/org/apache/mahout/math/hadoop/MatrixMultiplicationJob.java
e907a6d
core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java a046b41
core/src/main/java/org/apache/mahout/math/hadoop/decomposer/DistributedLanczosSolver.java
c81ef71
core/src/main/java/org/apache/mahout/math/hadoop/decomposer/EigenVerificationJob.java
2e152c4
core/src/main/java/org/apache/mahout/math/hadoop/similarity/SeedVectorUtil.java
4d63f46
core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java
ff517dc
core/src/main/java/org/apache/mahout/math/hadoop/solver/DistributedConjugateGradientSolver.java
eba6d2a
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
c52fe2a
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
0c3a996
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java
0fa8707
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java
59bdedb
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java
703c420
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
d314186
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
PRE-CREATION
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
98c8c59
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
b1a8b56
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java
53f26f4
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java
d58789e
core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java
bd8c6b1
core/src/main/java/org/apache/mahout/math/stats/entropy/Entropy.java 4a8078e
core/src/main/java/org/apache/mahout/vectorizer/collocations/llr/CollocDriver.java
7a0c639
core/src/test/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModelTest.java
984ef6c
core/src/test/java/org/apache/mahout/clustering/TestClusterClassifier.java
391bdf6
core/src/test/java/org/apache/mahout/clustering/TestClusterInterface.java
d9f06ec
core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
0b70339
core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
8a5e1ea
core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
d87c3e3
core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java
c996d97
core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
aa32112
core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java
8dd9d41
core/src/test/java/org/apache/mahout/common/AbstractJobTest.java 4feae91
core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java
0ef8622
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java
PRE-CREATION
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
59f79c5
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java
beb0102
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java
PRE-CREATION
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java
503433f
core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java
32342c1
examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java
1781481
examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java
4d4836f
examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java
7faf92e
examples/src/main/java/org/apache/mahout/clustering/display/DisplayDirichlet.java
2edadf1
examples/src/main/java/org/apache/mahout/clustering/display/DisplayFuzzyKMeans.java
a5ef4d0
examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java
bc5c2ea
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java
3833932
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/dirichlet/Job.java
32b9efe
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
3ac3cca
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
d63ac9e
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/meanshift/Job.java
ef69827
integration/pom.xml b751b98
integration/src/main/java/org/apache/mahout/classifier/ConfusionMatrixDumper.java
5958ce8
integration/src/main/java/org/apache/mahout/utils/MatrixDumper.java b71cb95
integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java
e108aa4
integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java
3bc72ab
integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java
11769b1
integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java
5a9d0f2
integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java
716aaf9
integration/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
eef9551
pom.xml 7485994
Diff: https://reviews.apache.org/r/3863/diff
Testing
-------
Additional unit tests for PCA
Thanks,
Dmitriy
> Add PCA options to SSVD code
> ----------------------------
>
> Key: MAHOUT-817
> URL: https://issues.apache.org/jira/browse/MAHOUT-817
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 0.7
>
> Attachments: MAHOUT-817.patch, MAHOUT-817.patch, MAHOUT-817.patch,
> SSVD-PCA options.pdf, ssvd-tests.R, ssvd.R, ssvd.m
>
>
> It seems that a simple solution should exist to integrate PCA mean
> subtraction into SSVD algorithm without making it a pre-requisite step and
> also avoiding densifying the big input.
> Several approaches were suggested:
> 1) subtract mean off B
> 2) propagate mean vector deeper into algorithm algebraically where the data
> is already collapsed to smaller matrices
> 3) --?
> It needs some math done first . I'll take a stab at 1 and 2 but thoughts and
> math are welcome.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira