Re: AWS MapReduce

John Conwell Mon, 05 Mar 2012 07:40:40 -0800

AWS MapReduce (EMR) does not use S3 for its HDFS persistance.  If it did
your S3 billing would be massive :)  EMR reads all input jar files and
input data from S3, but it copies these files down to its local disk.  It
then does starts the MR process, doing all HDFS reads and writes to the
local disks.  At the end of the MR job, it copies the MR job output and all
process logs to S3, and then tears down the VM instances.

You can see this for yourself if you spin up a small EMR cluster, but turn
off the configuration flag that kills the VMs at the end if the MR job.
 Then look at the hadoop configuration files to see how hadoop is
configured.

I really like EMR.  Amazon  has done a lot of work to optimize the hadoop
configurations and VM instance AMIs to execute MR jobs fairly efficiently
on a VM cluster.  I had to do a lot of (expensive) trial and error work to
figure out an optimal hadoop / VM configuration to run our MR jobs without
crashing / timing out the jobs.  The only reason we didnt standardize on
EMR was that it strongly bound your code base / process to using EMR for
hadoop processing, vs a flexible infrastructure that could use a local
cluster or cluster on a different cloud provider.

On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <mohitanch...@gmail.com>wrote:

> As far as I see in the docs it looks like you could also use hdfs instead
> of s3. But what I am not sure is if these are local disks or EBS.
>
> On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer <
> hannesc...@googlemail.com
> > wrote:
>
> > Hi,
> >
> > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow.
> > The setup is done pretty fast and there are some configuration parameters
> > you can bypass - for example blocksizes etc. - but in the end imho
> setting
> > up ec2 instances by copying images is the better alternative.
> >
> > Kind Regards
> >
> > Hannes
> >
> > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <mohitanch...@gmail.com
> > >wrote:
> >
> > > I think found answer to this question. However, it's still not clear if
> > > HDFS is on local disk or EBS volumes. Does anyone know?
> > >
> > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <mohitanch...@gmail.com
> > > >wrote:
> > >
> > > > Just want to check  how many are using AWS mapreduce and understand
> the
> > > > pros and cons of Amazon's MapReduce machines? Is it true that these
> map
> > > > reduce machines are really reading and writing from S3 instead of
> local
> > > > disks? Has anyone found issues with Amazon MapReduce and how does it
> > > > compare with using MapReduce on local attached disks compared to
> using
> > > S3.
> > >
> >
> > ---
> > www.informera.de
> > Hadoop & Big Data Services
> >
>

-- 

Thanks,
John C

Re: AWS MapReduce

Reply via email to