[
https://issues.apache.org/jira/browse/MAHOUT-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087136#comment-13087136
]
Dmitriy Lyubimov edited comment on MAHOUT-771 at 8/18/11 5:40 PM:
------------------------------------------------------------------
I think SSVD in general (in the form it exists) can benefit from improvements
in flops since it feels somewhat cpu-bound.
However the projecting code is smaller part of it, it is sort of linear to the
input size (number of rows, anyway), whereas bottleneck is QR computations
which have a quadratic component to it and it quickly overshadows any
projection expenses as input size grows.
That said, existing code serves very well all practical purposes and hadoop
startup times and unneeded sorts at times are still very much nuisance compared
to anything else.
was (Author: dlyubimov):
I think SSVD in general (in the form it exists) can benefit from
improvements in flops since it feels somewhat cpu-bound.
However the projecting code is smaller part of it, it is sort of linear to the
input size (number of rows, anyway), whereas bottleneck is QR computations
which have a quadratic component to it and it quickly overshadows any
projection expenses as input size grows.
> Random Projection using sampled values
> --------------------------------------
>
> Key: MAHOUT-771
> URL: https://issues.apache.org/jira/browse/MAHOUT-771
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Lance Norskog
> Priority: Minor
> Attachments: RandomProjector.patch, RandomProjectorBenchmark.java
>
>
> Random Projection implementation which follows two deterministic guarantees:
> # The same data projected multiple times produces the same output
> # Dense and sparse data with the same contents produce the same output
> Custom class that does Random Projection based on Johnson-Lindenstrauss. This
> implementation uses Achlioptas's results, which allow using method other than
> a full-range random multiplier per sample:
> * use 1 random bit to add or subtract a sample to a row sum
> * use a random value from 1/6 to add (1/6), subtract (1/6), or ignore (4 out
> of 6) a sample to a row sum
> Custom implementations for both dense and sparse vectors are included. The
> sparse vector implementation assumes the active values will fit in memory.
> An implementation using full-range random multipliers made by
> java.util.Random is included for reference/research.
> *Database-friendly random projections: Johnson-Lindenstrauss with binary
> coins*
> _Dimitris Achlioptas_
> [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.4546&rep=rep1&type=pdf]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira