I've been using EMR for the public terabyte dataset project.
In general it's worked for me, with the following caveats:
1. Hadoop 0.18.3, which meant I had to re-work some of my code that
depended on newer (Hadoop 0.19.x) support.
2. It was kind of painful to get it running initially (setting up the
right credentials.json file, etc)
3. You'll need S3 access, of course, which is another series of hoops
to jump through.
4. You really want to run in the mode where you create an EMR job with
no steps, then add steps to run - otherwise you can waste a lot of
time firing up EMR jobs that fail immediately.
5. For bigger clusters, some of the Hadoop configuration parameters
aren't set very well.
-- Ken
On Jan 10, 2010, at 4:21pm, Benson Margulies wrote:
That's what I meant. I haven't tried it yet, so I've got the same
question Jake has.
On Sun, Jan 10, 2010 at 6:27 PM, Jake Mannix <[email protected]>
wrote:
You mean Elastic MapReduce (EMR)? Has anyone here had any luck
with that
for this or other projects?
-jake
On Jan 10, 2010 3:21 PM, "Benson Margulies" <[email protected]>
wrote:
Stupid question: I thought there was a way to use the cloud as a
hadoop farm directly without having to configure instances.
On Sun, Jan 10, 2010 at 6:18 PM, Sean Owen <[email protected]>
wrote: > I
like the Alestic instances...
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g