Hello everyone,

I just implemented an eigensolver based on Halko's article "Finding structure 
with randomness", for streamed input. I couldn't find any way to do it in a 
single pass (without requiring O(number of observations) of memory), so my 
version works in two passes over the input (no power iterations).

I was looking around to see if there are other streamed (out-of-core) 
implementations, to compare and perhaps get inspiration ;) and I came across 
MAHOUT-309: https://issues.apache.org/jira/browse/MAHOUT-309

That issue seems very quiet though, how far along did you guys get?

This stochastic algorithm seems pretty fast: 2.5h on the English Wikipedia 
(3.2M documents, 200K features, 0.5G non-zeros) for 400 factors, compared to 
14h for the incremental, non-stochastic one-pass algo, on my MacBook. And I 
think it's more accurate, too, but I'll have to run some more tests.

I'm curious to hear what your experience was, cheers,
Radim

Reply via email to