Re: anybody want to set a record with Mahout?

Grant Ingersoll Thu, 25 Feb 2010 12:42:35 -0800

On Feb 25, 2010, at 3:30 PM, Jake Mannix wrote:

> Hmm...
> 
> code: *check*
> desire to add stochastic decomp to code: *check*
> amazon credits: *check* (my account today: almost $300 left burning hole in
> pocket)


How hard is it to combine instances across Amazon?  That is, I've got ~$250 at 
the moment, too.  I bet we could ask Amazon for some more

> relatively gigantic social graph: *check*
> legal ability to put gigantic social graph on ec2: not so check, but maybe
> some
> clever anonymization work on export could be done here.

I'd be a little wary of that and I'd hate to see anything happen to it (AOL 
comes to mind).  That being said, if you just export the vectors w/o the key, 
it really is pretty anonymous.    What other sources can we get?  


> 
> Let's break some records! :)

+1

> 
>  -jake
> 
> On Thu, Feb 25, 2010 at 12:18 PM, Drew Farris <drew.far...@gmail.com> wrote:
> 
>> Sound's pretty interesting. Assuming this is EC2, Would be great if
>> Amazon would pick up the tab, us being an open source project and all
>> and potentially good marketing to boot. Also, whomever's account is
>> used will have to have its default limit of 20 machines raised.
>> 
>> On Thu, Feb 25, 2010 at 3:11 PM, Robin Anil <robin.a...@gmail.com> wrote:
>>> +1 I'm ready. What do we need. Perf Tuning! Cluster Setup?, Amazon
>> Credits?
>>> Someone to pay for the machines or from our own pockets?
>>> 
>>> 
>>> Robin
>>> 
>>> On Fri, Feb 26, 2010 at 1:20 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>> 
>>>> These guys:
>>>> 
>>>> 
>>>> 
>> http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667
>>>> 
>>>> say this:
>>>> 
>>>>> We present experiments over a collection with 3.6 billions of
>>>> postings---two orders of magnitudes larger than any published experiment
>> in
>>>> the literature.
>>>> 
>>>> My impression is that Mahout on about 100 machines is ready to break
>> this
>>>> record with Jake's latest code.  The stochastic decomposition should
>> make
>>>> it
>>>> even more plausible.
>>>> 
>>>> The hardest part will be to find reasonable data with > 4 billion
>> non-zero
>>>> entries.  At 0.01% sparsity, this is roughly a square matrix with 5
>> million
>>>> rows and columns.
>>>> 
>>>> Jake, your social graph should be much larger than that.
>>>> 
>>>> --
>>>> Ted Dunning, CTO
>>>> DeepDyve
>>>> 
>>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: anybody want to set a record with Mahout?

Reply via email to