Re: Clustering from DB

zaki rahaman Wed, 15 Jul 2009 14:47:04 -0700

I'm still prototyping something to make sure it works before I start working
on rolling it out for a large (~500GB) backlog of server data that I want to
work with. As such, I haven't looked seriously into using EC2 until the test
runs work well, but plan on doing so in the next couple days. I'd be more
than happy to write a script to run a Job or work on a mahout AMI config.


On Wed, Jul 15, 2009 at 5:40 PM, Grant Ingersoll <[email protected]>wrote:

>
> On Jul 15, 2009, at 5:25 PM, zaki rahaman wrote:
>
>  I hope I'm understanding your setup correctly but by running on one
>> machine,
>> you're not fully exploiting the capabilities of Hadoop's Map/Reduce. Gains
>> in computation time will only be seen by increasing the number of cores or
>> nodes.
>>
>
> Yep.
>
>  If you need access to more computing power, you might want to
>> consider using Amazon's EC2 (they have preconfigured AMIs for Hadoop but
>> youd have to configure and install Mahout, a process which I'm not totally
>> familiar with as of yet as I'm still trying to do it myself).
>>
>
> Please add to http://cwiki.apache.org/MAHOUT/mahoutec2.html if you can.
>  Given a Hadoop AMI, it shouldn't be all that hard to setup a Job, I
> wouldn't think.  Would be good to have a script that does it, though.
>
> -Grant
>



-- 
Zaki Rahaman

Re: Clustering from DB

Reply via email to