Sound's pretty interesting. Assuming this is EC2, Would be great if
Amazon would pick up the tab, us being an open source project and all
and potentially good marketing to boot. Also, whomever's account is
used will have to have its default limit of 20 machines raised.

On Thu, Feb 25, 2010 at 3:11 PM, Robin Anil <robin.a...@gmail.com> wrote:
> +1 I'm ready. What do we need. Perf Tuning! Cluster Setup?, Amazon Credits?
> Someone to pay for the machines or from our own pockets?
>
>
> Robin
>
> On Fri, Feb 26, 2010 at 1:20 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
>> These guys:
>>
>>
>> http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667
>>
>> say this:
>>
>>   > We present experiments over a collection with 3.6 billions of
>> postings---two orders of magnitudes larger than any published experiment in
>> the literature.
>>
>> My impression is that Mahout on about 100 machines is ready to break this
>> record with Jake's latest code.  The stochastic decomposition should make
>> it
>> even more plausible.
>>
>> The hardest part will be to find reasonable data with > 4 billion non-zero
>> entries.  At 0.01% sparsity, this is roughly a square matrix with 5 million
>> rows and columns.
>>
>> Jake, your social graph should be much larger than that.
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>

Reply via email to