Re: Clustering from DB

Grant Ingersoll Wed, 15 Jul 2009 15:04:43 -0700

Very cool! Would love to hear more if you can share. Getting usecases and powered by info out to the public is one of the key thingswe can do to drive adoption and increase Mahout's capabilities.


On Jul 15, 2009, at 5:46 PM, zaki rahaman wrote:

I'm still prototyping something to make sure it works before I startworkingon rolling it out for a large (~500GB) backlog of server data that Iwant towork with. As such, I haven't looked seriously into using EC2 untilthe testruns work well, but plan on doing so in the next couple days. I'd bemorethan happy to write a script to run a Job or work on a mahout AMIconfig.
On Wed, Jul 15, 2009 at 5:40 PM, Grant Ingersoll<[email protected]>wrote:
On Jul 15, 2009, at 5:25 PM, zaki rahaman wrote:

I hope I'm understanding your setup correctly but by running on one
machine,
you're not fully exploiting the capabilities of Hadoop's Map/Reduce. Gainsin computation time will only be seen by increasing the number ofcores or
nodes.
Yep.

If you need access to more computing power, you might want to
consider using Amazon's EC2 (they have preconfigured AMIs forHadoop butyoud have to configure and install Mahout, a process which I'm nottotally
familiar with as of yet as I'm still trying to do it myself).
Please add to http://cwiki.apache.org/MAHOUT/mahoutec2.html if youcan.
Given a Hadoop AMI, it shouldn't be all that hard to setup a Job, I
wouldn't think.  Would be good to have a script that does it, though.

-Grant
--
Zaki Rahaman


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Clustering from DB

Reply via email to