How bout trying with a bunch of beefy spot instances from Amazon. I'm willing to bankroll partially just to see what happens. But again, where to find a suitable dataset?
On Thu, Feb 25, 2010 at 3:11 PM, Robin Anil <robin.a...@gmail.com> wrote: > +1 I'm ready. What do we need. Perf Tuning! Cluster Setup?, Amazon Credits? > Someone to pay for the machines or from our own pockets? > > > Robin > > On Fri, Feb 26, 2010 at 1:20 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > These guys: > > > > > > > http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667 > > > > say this: > > > > > We present experiments over a collection with 3.6 billions of > > postings---two orders of magnitudes larger than any published experiment > in > > the literature. > > > > My impression is that Mahout on about 100 machines is ready to break this > > record with Jake's latest code. The stochastic decomposition should make > > it > > even more plausible. > > > > The hardest part will be to find reasonable data with > 4 billion > non-zero > > entries. At 0.01% sparsity, this is roughly a square matrix with 5 > million > > rows and columns. > > > > Jake, your social graph should be much larger than that. > > > > -- > > Ted Dunning, CTO > > DeepDyve > > > -- Zaki Rahaman