On May 19, 2009, at 7:11 AM, Grant Ingersoll wrote:


On May 19, 2009, at 6:59 AM, Tim Bass wrote:

Dear All,

A few months ago (on the developer's list) we briefly touched on the
idea of building a Mahout public AMI on EC2.

Subsequently, Amazon released EMR and a number of folks have
experimented with running sample Mahout jobs on EMR.

What are the pros and cons of creating a public Mahout AMI with Hadoop
and MapReduce configured with the versions that
are supported by the developers, in addition to Amazon's EMR implementation?

AFAICT, one issue seems to be that EMR locks you into a specific Hadoop instance. Not sure if "locks" is too strong, maybe I should say it "encourages" you to use a specific version?

Actually, I think "locks" is more appropriate. They're using Hadoop 0.18.3 with some feature backports (according to what they said to me), so if you want features from a newer Hadoop (isn't 0.20 the current release? It looked like it had a lot of new stuff), you're pretty much done for.

Also, they charge extra for EMR jobs, which strikes me as a bit crazy (see Greg Linden's comments about variable pricing), and may strike some folks as a reason to run their own clusters.

As Ted and others pointed out, I think we would benefit from tools that make it easy to add Mahout to an AMI.

Perhaps you could base it off of one of the Cloudera Hadoop AMIs? They're publically available, and they handle all the Hadoop business. I have no idea what the redistribution license would be, and I am most definitely not a lawyer!

Steve
--
Stephen Green                      //   [email protected]
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692



Reply via email to