Very cool! Would love to hear more if you can share. Getting use
cases and powered by info out to the public is one of the key things
we can do to drive adoption and increase Mahout's capabilities.
On Jul 15, 2009, at 5:46 PM, zaki rahaman wrote:
I'm still prototyping something to make sure it works before I start
working
on rolling it out for a large (~500GB) backlog of server data that I
want to
work with. As such, I haven't looked seriously into using EC2 until
the test
runs work well, but plan on doing so in the next couple days. I'd be
more
than happy to write a script to run a Job or work on a mahout AMI
config.
On Wed, Jul 15, 2009 at 5:40 PM, Grant Ingersoll
<[email protected]>wrote:
On Jul 15, 2009, at 5:25 PM, zaki rahaman wrote:
I hope I'm understanding your setup correctly but by running on one
machine,
you're not fully exploiting the capabilities of Hadoop's Map/
Reduce. Gains
in computation time will only be seen by increasing the number of
cores or
nodes.
Yep.
If you need access to more computing power, you might want to
consider using Amazon's EC2 (they have preconfigured AMIs for
Hadoop but
youd have to configure and install Mahout, a process which I'm not
totally
familiar with as of yet as I'm still trying to do it myself).
Please add to http://cwiki.apache.org/MAHOUT/mahoutec2.html if you
can.
Given a Hadoop AMI, it shouldn't be all that hard to setup a Job, I
wouldn't think. Would be good to have a script that does it, though.
-Grant
--
Zaki Rahaman
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search