Hi Sudhir, Can you publish your findings around pricing, and how you calculated the various aspects?
This is great information. Thanks Dave Viner On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu < [email protected]> wrote: > We recently crossed this bridge and here are some insights. We did an > extensive study comparing costs and benchmarking local vs EMR for our > current needs and future trend. > > - Scalability you get with EMR is unmatched although you need to look at > your requirement and decide this is something you need. > > - When using EMR its cheaper to use reserved instances vs nodes on the fly. > You can always add more nodes when required. I suggest looking at your > current computing needs and reserve instances for a year or two and use > these to run EMR and add nodes at peak needs. In your cost estimation you > will need to factor in the data transfer time/costs unless you are dealing > with public datasets on S3 > > - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to > benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO > benchmark). For IO intensive jobs you will need to add more nodes to > compensate this. > > - When compared to local cluster, you will need to factor the time it takes > for the EMR cluster to setup when starting a job. This like data transfer > time, cluster replication time etc > > - EMR API is very flexible however you will need to build a custom > interface > on top of it to suit your job management and monitoring needs > > - EMR bootstrap actions can satisfy most of your native lib needs so no > drawbacks there. > > > -- Sudhir > > > On 12/26/10 5:26 AM, "[email protected]" > <[email protected]> wrote: > > > From: Otis Gospodnetic <[email protected]> > > Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST) > > To: <[email protected]> > > Subject: Re: Hadoop/Elastic MR on AWS > > > > Hello Amandeep, > > > > > > > > ----- Original Message ---- > >> From: Amandeep Khurana <[email protected]> > >> To: [email protected] > >> Sent: Fri, December 10, 2010 1:14:45 AM > >> Subject: Re: Hadoop/Elastic MR on AWS > >> > >> Mark, > >> > >> Using EMR makes it very easy to start a cluster and add/reduce capacity > as > >> and when required. There are certain optimizations that make EMR an > >> attractive choice as compared to building your own cluster out. Using > EMR > > > > > > Could you please point out what optimizations you are referring to? > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > Hadoop ecosystem search :: http://search-hadoop.com/ > > > >> also ensures you are using a production quality, stable system backed by > the > >> EMR engineers. You can always use bootstrap actions to put your own > tweaked > >> version of Hadoop in there if you want to do that. > >> > >> Also, you don't have to tear down your cluster after every job. You can > set > >> the alive option when you start your cluster and it will stay there > even > >> after your Hadoop job completes. > >> > >> If you face any issues with EMR, send me a mail offline and I'll be > happy to > >> help. > >> > >> -Amandeep > >> > >> > >> On Thu, Dec 9, 2010 at 9:47 PM, Mark <[email protected]> > wrote: > >> > >>> Does anyone have any thoughts/experiences on running Hadoop in AWS? > What > >>> are some pros/cons? > >>> > >>> Are there any good AMI's out there for this? > >>> > >>> Thanks for any advice. > >>> > >> > > > iCrossing Privileged and Confidential Information > This email message is for the sole use of the intended recipient(s) and may > contain confidential and privileged information of iCrossing. Any > unauthorized review, use, disclosure or distribution is prohibited. If you > are not the intended recipient, please contact the sender by reply email and > destroy all copies of the original message. > > >
