On Feb 25, 2010, at 3:30 PM, Jake Mannix wrote: > Hmm... > > code: *check* > desire to add stochastic decomp to code: *check* > amazon credits: *check* (my account today: almost $300 left burning hole in > pocket)
How hard is it to combine instances across Amazon? That is, I've got ~$250 at the moment, too. I bet we could ask Amazon for some more > relatively gigantic social graph: *check* > legal ability to put gigantic social graph on ec2: not so check, but maybe > some > clever anonymization work on export could be done here. I'd be a little wary of that and I'd hate to see anything happen to it (AOL comes to mind). That being said, if you just export the vectors w/o the key, it really is pretty anonymous. What other sources can we get? > > Let's break some records! :) +1 > > -jake > > On Thu, Feb 25, 2010 at 12:18 PM, Drew Farris <drew.far...@gmail.com> wrote: > >> Sound's pretty interesting. Assuming this is EC2, Would be great if >> Amazon would pick up the tab, us being an open source project and all >> and potentially good marketing to boot. Also, whomever's account is >> used will have to have its default limit of 20 machines raised. >> >> On Thu, Feb 25, 2010 at 3:11 PM, Robin Anil <robin.a...@gmail.com> wrote: >>> +1 I'm ready. What do we need. Perf Tuning! Cluster Setup?, Amazon >> Credits? >>> Someone to pay for the machines or from our own pockets? >>> >>> >>> Robin >>> >>> On Fri, Feb 26, 2010 at 1:20 AM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >>> >>>> These guys: >>>> >>>> >>>> >> http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667 >>>> >>>> say this: >>>> >>>>> We present experiments over a collection with 3.6 billions of >>>> postings---two orders of magnitudes larger than any published experiment >> in >>>> the literature. >>>> >>>> My impression is that Mahout on about 100 machines is ready to break >> this >>>> record with Jake's latest code. The stochastic decomposition should >> make >>>> it >>>> even more plausible. >>>> >>>> The hardest part will be to find reasonable data with > 4 billion >> non-zero >>>> entries. At 0.01% sparsity, this is roughly a square matrix with 5 >> million >>>> rows and columns. >>>> >>>> Jake, your social graph should be much larger than that. >>>> >>>> -- >>>> Ted Dunning, CTO >>>> DeepDyve >>>> >>> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search