Hello everyone, I just implemented an eigensolver based on Halko's article "Finding structure with randomness", for streamed input. I couldn't find any way to do it in a single pass (without requiring O(number of observations) of memory), so my version works in two passes over the input (no power iterations).
I was looking around to see if there are other streamed (out-of-core) implementations, to compare and perhaps get inspiration ;) and I came across MAHOUT-309: https://issues.apache.org/jira/browse/MAHOUT-309 That issue seems very quiet though, how far along did you guys get? This stochastic algorithm seems pretty fast: 2.5h on the English Wikipedia (3.2M documents, 200K features, 0.5G non-zeros) for 400 factors, compared to 14h for the incremental, non-stochastic one-pass algo, on my MacBook. And I think it's more accurate, too, but I'll have to run some more tests. I'm curious to hear what your experience was, cheers, Radim
