Thank you for sharing. Sent from my mobile. Please excuse the typos.
On 2010-12-27, at 11:18 AM, Sudhir Vallamkondu <[email protected]> wrote: > We recently crossed this bridge and here are some insights. We did an > extensive study comparing costs and benchmarking local vs EMR for our > current needs and future trend. > > - Scalability you get with EMR is unmatched although you need to look at > your requirement and decide this is something you need. > > - When using EMR its cheaper to use reserved instances vs nodes on the fly. > You can always add more nodes when required. I suggest looking at your > current computing needs and reserve instances for a year or two and use > these to run EMR and add nodes at peak needs. In your cost estimation you > will need to factor in the data transfer time/costs unless you are dealing > with public datasets on S3 > > - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to > benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO > benchmark). For IO intensive jobs you will need to add more nodes to > compensate this. > > - When compared to local cluster, you will need to factor the time it takes > for the EMR cluster to setup when starting a job. This like data transfer > time, cluster replication time etc > > - EMR API is very flexible however you will need to build a custom interface > on top of it to suit your job management and monitoring needs > > - EMR bootstrap actions can satisfy most of your native lib needs so no > drawbacks there. > > > -- Sudhir > > > On 12/26/10 5:26 AM, "[email protected]" > <[email protected]> wrote: > >> From: Otis Gospodnetic <[email protected]> >> Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST) >> To: <[email protected]> >> Subject: Re: Hadoop/Elastic MR on AWS >> >> Hello Amandeep, >> >> >> >> ----- Original Message ---- >>> From: Amandeep Khurana <[email protected]> >>> To: [email protected] >>> Sent: Fri, December 10, 2010 1:14:45 AM >>> Subject: Re: Hadoop/Elastic MR on AWS >>> >>> Mark, >>> >>> Using EMR makes it very easy to start a cluster and add/reduce capacity as >>> and when required. There are certain optimizations that make EMR an >>> attractive choice as compared to building your own cluster out. Using EMR >> >> >> Could you please point out what optimizations you are referring to? >> >> Thanks, >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase >> Hadoop ecosystem search :: http://search-hadoop.com/ >> >>> also ensures you are using a production quality, stable system backed by >>> the >>> EMR engineers. You can always use bootstrap actions to put your own tweaked >>> version of Hadoop in there if you want to do that. >>> >>> Also, you don't have to tear down your cluster after every job. You can set >>> the alive option when you start your cluster and it will stay there even >>> after your Hadoop job completes. >>> >>> If you face any issues with EMR, send me a mail offline and I'll be happy >>> to >>> help. >>> >>> -Amandeep >>> >>> >>> On Thu, Dec 9, 2010 at 9:47 PM, Mark <[email protected]> wrote: >>> >>>> Does anyone have any thoughts/experiences on running Hadoop in AWS? What >>>> are some pros/cons? >>>> >>>> Are there any good AMI's out there for this? >>>> >>>> Thanks for any advice. >>>> >>> > > > iCrossing Privileged and Confidential Information > This email message is for the sole use of the intended recipient(s) and may > contain confidential and privileged information of iCrossing. Any > unauthorized review, use, disclosure or distribution is prohibited. If you > are not the intended recipient, please contact the sender by reply email and > destroy all copies of the original message. > >
