[
https://issues.apache.org/jira/browse/MAHOUT-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003462#comment-13003462
]
Ted Dunning commented on MAHOUT-309:
------------------------------------
Yes. Duplicated and superseded entirely.
> Implement Stochastic Decomposition
> ----------------------------------
>
> Key: MAHOUT-309
> URL: https://issues.apache.org/jira/browse/MAHOUT-309
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.4
> Reporter: Jake Mannix
> Assignee: Ted Dunning
> Fix For: 0.5
>
>
> Techniques reviewed in <a href="http://arxiv.org/abs/0909.4061">Halko,
> Martinsson, and Tropp</a>.
> The basic idea of the implementation is as follows: if the input matrix is
> represented as a DistributedSparseRowMatrix (backed by a sequence-file of
> <Writable,VectorWritable> - the values of which should be
> SequentialAccessSparseVector instances for best performance), and you
> optionally have a kernel function f(v) which maps sparse
> numColumns-dimensional (here numColumns is unconstrained in size) vectors to
> sparse numKernelizedFeatures-dimensional (also unconstrained in size) vectors
> (in the case where you want to do kernel-PCA, for example, for a kernel
> k(u,v) = f(u).dot( f(v) )), then take the MurmurHash (from MAHOUT-228) and
> maps the numKernelizedFeatures-dimensional vectors and projects down to some
> numHashedFeatures-dimensional space (reasonably-sized - no more than a 10^2
> to 10^4).
> This is all done in the Mapper, and there are two outputs: the
> numHashedFeatures-dimensional vector itself (if the left-singular vectors are
> ever desired), which does not need to be Reduced, and the outer-product of
> this vector with itself, where the Reducer/Combiner just does the matrix sum
> on the partial outputs, eventually producing the kernel / gram matrix of your
> hashed features, which can then be run through a simple eigen-decomposition,
> the ((1/eigenvalue)-scaled) eigenvectors of which can be applied to project
> the (optional) numHashedFeatures-dimensional outputs mentioned earlier in
> this paragraph to get the left-singular vectors / reduced projections (which
> can be then run through clustering, etc...).
> Good fun will be had by all.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira