I'm still prototyping something to make sure it works before I start working on rolling it out for a large (~500GB) backlog of server data that I want to work with. As such, I haven't looked seriously into using EC2 until the test runs work well, but plan on doing so in the next couple days. I'd be more than happy to write a script to run a Job or work on a mahout AMI config.
On Wed, Jul 15, 2009 at 5:40 PM, Grant Ingersoll <[email protected]>wrote: > > On Jul 15, 2009, at 5:25 PM, zaki rahaman wrote: > > I hope I'm understanding your setup correctly but by running on one >> machine, >> you're not fully exploiting the capabilities of Hadoop's Map/Reduce. Gains >> in computation time will only be seen by increasing the number of cores or >> nodes. >> > > Yep. > > If you need access to more computing power, you might want to >> consider using Amazon's EC2 (they have preconfigured AMIs for Hadoop but >> youd have to configure and install Mahout, a process which I'm not totally >> familiar with as of yet as I'm still trying to do it myself). >> > > Please add to http://cwiki.apache.org/MAHOUT/mahoutec2.html if you can. > Given a Hadoop AMI, it shouldn't be all that hard to setup a Job, I > wouldn't think. Would be good to have a script that does it, though. > > -Grant > -- Zaki Rahaman
