I have been doing factorizations (SVD, not NMF) in R on sparse matrices of about this size. Stochastic decomposition algorithms are incredibly fast on this size data and in many cases don't even need a block decomposition. I am computing relatively few singular values (about 30), but most of the SVD's take only a fraction of a second.
On Mon, Apr 26, 2010 at 11:14 AM, Jake Mannix <jake.man...@gmail.com> wrote: > > I have a matrix that has 3,000,000 by 70,000 entries, however it is very > > sparse. It could be broken down to 60,000,000 non-zero data points. > > > > 2. Am I better off using R, than Mahout? > > > > 60 million doubles as a data set fits in memory (~0.5GB), and depending on > what algorithm you use, if you stay sparse, you should be fine in R. If > you do something which has dense intermediate results, you'll be toast, > however.