On Apr 14, 2009, at 5:17 PM, Grant Ingersoll wrote:

I would be concerned about the fact that EMR is using 0.18 and Mahout is on 0.19 (which of course raises another concern expressed by Owen O'Malley to me at ApacheCon: No one uses 0.19)

Well, I did run Mahout locally on a 0.18.3 install, but that was writing to and reading from HDFS. I can build a custom mahout- examples that has the 0.18.3 Hadoop jars (or perhaps no hadoop jar at all...) I'm guessing if EMR is on 0.18.3 and it gets popular, then you're going to have to deal with that problem.

I'd say you should try reproducing the problem on the same version that Mahout uses.

That'll be a bit tricky in the EMR case as that's Amazon's business (ask me about trying to get a 64bit Solaris AMI on Amazon's version of Xen...)


FWIW, any committer on the Mahout project can likely get credits to use AWS.

I'm happy to share my limited experience.

Also:

----- Original Message ----
From: Sean Owen <[email protected]>
To: [email protected]
Sent: Tuesday, April 14, 2009 4:19:51 PM
Subject: Re: Mahout on Elastic MapReduce

This is a fairly uninformed observation, but: the error seems to be
from Hadoop. It seems to say that it understands hdfs:, but not s3n:,
and that makes sense to me. Do we expect Hadoop understands how to
read from S3? I would expect not. (Though, you point to examples that
seem to overcome this just fine?)

As Otis pointed out, Hadoop can handle S3 a couple of ways, and the example that I've been working seems to be able to read the input data from an s3n URI no problem.

When I have integrated code with stuff stored on S3, I have always had
to write extra glue code to copy from S3 to a local file system, do
work, then copy back.

I think you do need to copy from S3 to HDFS, but I think that happens automagically (? My Hadoop ignorance is starting to show!)

Steve
--
Stephen Green                      //   [email protected]
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692



Reply via email to