AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did your S3 billing would be massive :) EMR reads all input jar files and input data from S3, but it copies these files down to its local disk. It then does starts the MR process, doing all HDFS reads and writes to the local disks. At the end of the MR job, it copies the MR job output and all process logs to S3, and then tears down the VM instances.
You can see this for yourself if you spin up a small EMR cluster, but turn off the configuration flag that kills the VMs at the end if the MR job. Then look at the hadoop configuration files to see how hadoop is configured. I really like EMR. Amazon has done a lot of work to optimize the hadoop configurations and VM instance AMIs to execute MR jobs fairly efficiently on a VM cluster. I had to do a lot of (expensive) trial and error work to figure out an optimal hadoop / VM configuration to run our MR jobs without crashing / timing out the jobs. The only reason we didnt standardize on EMR was that it strongly bound your code base / process to using EMR for hadoop processing, vs a flexible infrastructure that could use a local cluster or cluster on a different cloud provider. On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > As far as I see in the docs it looks like you could also use hdfs instead > of s3. But what I am not sure is if these are local disks or EBS. > > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer < > hannesc...@googlemail.com > > wrote: > > > Hi, > > > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > > The setup is done pretty fast and there are some configuration parameters > > you can bypass - for example blocksizes etc. - but in the end imho > setting > > up ec2 instances by copying images is the better alternative. > > > > Kind Regards > > > > Hannes > > > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <mohitanch...@gmail.com > > >wrote: > > > > > I think found answer to this question. However, it's still not clear if > > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <mohitanch...@gmail.com > > > >wrote: > > > > > > > Just want to check how many are using AWS mapreduce and understand > the > > > > pros and cons of Amazon's MapReduce machines? Is it true that these > map > > > > reduce machines are really reading and writing from S3 instead of > local > > > > disks? Has anyone found issues with Amazon MapReduce and how does it > > > > compare with using MapReduce on local attached disks compared to > using > > > S3. > > > > > > > --- > > www.informera.de > > Hadoop & Big Data Services > > > -- Thanks, John C