[jira] Commented: (MAHOUT-309) Implement Stochastic Decomposition

Ted Dunning (JIRA) Mon, 07 Mar 2011 09:40:25 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003462#comment-13003462
 ]


Ted Dunning commented on MAHOUT-309:
------------------------------------

Yes.  Duplicated and superseded entirely.

> Implement Stochastic Decomposition
> ----------------------------------
>
>                 Key: MAHOUT-309
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-309
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Jake Mannix
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>
> Techniques reviewed in <a href="http://arxiv.org/abs/0909.4061";>Halko, 
> Martinsson, and Tropp</a>.
> The basic idea of the implementation is as follows: if the input matrix is 
> represented as a DistributedSparseRowMatrix (backed by a sequence-file of 
> <Writable,VectorWritable> - the values of which should be 
> SequentialAccessSparseVector instances for best performance), and you 
> optionally have a kernel function f(v) which maps sparse 
> numColumns-dimensional (here numColumns is unconstrained in size) vectors to 
> sparse numKernelizedFeatures-dimensional (also unconstrained in size) vectors 
> (in the case where you want to do kernel-PCA, for example, for a kernel 
> k(u,v) = f(u).dot( f(v) )), then take the MurmurHash (from MAHOUT-228) and 
> maps the numKernelizedFeatures-dimensional vectors and projects down to some 
> numHashedFeatures-dimensional space (reasonably-sized - no more than a 10^2 
> to 10^4).  
> This is all done in the Mapper, and there are two outputs: the 
> numHashedFeatures-dimensional vector itself (if the left-singular vectors are 
> ever desired), which does not need to be Reduced, and the outer-product of 
> this vector with itself, where the Reducer/Combiner just does the matrix sum 
> on the partial outputs, eventually producing the kernel / gram matrix of your 
> hashed features, which can then be run through a simple eigen-decomposition, 
> the ((1/eigenvalue)-scaled) eigenvectors of which can be applied to project 
> the (optional) numHashedFeatures-dimensional outputs mentioned earlier in 
> this paragraph to get the left-singular vectors / reduced projections (which 
> can be then run through clustering, etc...).
> Good fun will be had by all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-309) Implement Stochastic Decomposition

Reply via email to