Hi Sudhir,

Can you publish your findings around pricing, and how you calculated the
various aspects?

This is great information.

Thanks
Dave Viner


On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu <
[email protected]> wrote:

> We recently crossed this bridge and here are some insights. We did an
> extensive study comparing costs and benchmarking local vs EMR for our
> current needs and future trend.
>
> - Scalability you get with EMR is unmatched although you need to look at
> your requirement and decide this is something you need.
>
> - When using EMR its cheaper to use reserved instances vs nodes on the fly.
> You can always add more nodes when required. I suggest looking at your
> current computing needs and reserve instances for a year or two and use
> these to run EMR and add nodes at peak needs. In your cost estimation you
> will need to factor in the data transfer time/costs unless you are dealing
> with public datasets on S3
>
> - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
> benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
> benchmark). For IO intensive jobs you will need to add more nodes to
> compensate this.
>
> - When compared to local cluster, you will need to factor the time it takes
> for the EMR cluster to setup when starting a job. This like data transfer
> time, cluster replication time etc
>
> - EMR API is very flexible however you will need to build a custom
> interface
> on top of it to suit your job management and monitoring needs
>
> - EMR bootstrap actions can satisfy most of your native lib needs so no
> drawbacks there.
>
>
> -- Sudhir
>
>
> On 12/26/10 5:26 AM, "[email protected]"
> <[email protected]> wrote:
>
> > From: Otis Gospodnetic <[email protected]>
> > Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
> > To: <[email protected]>
> > Subject: Re: Hadoop/Elastic MR on AWS
> >
> > Hello Amandeep,
> >
> >
> >
> > ----- Original Message ----
> >> From: Amandeep Khurana <[email protected]>
> >> To: [email protected]
> >> Sent: Fri, December 10, 2010 1:14:45 AM
> >> Subject: Re: Hadoop/Elastic MR on AWS
> >>
> >> Mark,
> >>
> >> Using EMR makes it very easy to start a cluster and add/reduce  capacity
> as
> >> and when required. There are certain optimizations that make EMR  an
> >> attractive choice as compared to building your own cluster out. Using
>  EMR
> >
> >
> > Could you please point out what optimizations you are referring to?
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >> also ensures you are using a production quality, stable system backed by
>  the
> >> EMR engineers. You can always use bootstrap actions to put your own
>  tweaked
> >> version of Hadoop in there if you want to do that.
> >>
> >> Also, you  don't have to tear down your cluster after every job. You can
> set
> >> the alive  option when you start your cluster and it will stay there
> even
> >> after your  Hadoop job completes.
> >>
> >> If you face any issues with EMR, send me a mail  offline and I'll be
> happy to
> >> help.
> >>
> >> -Amandeep
> >>
> >>
> >> On Thu, Dec 9,  2010 at 9:47 PM, Mark <[email protected]>
>  wrote:
> >>
> >>> Does anyone have any thoughts/experiences on running Hadoop  in AWS?
> What
> >>> are some pros/cons?
> >>>
> >>> Are there any good  AMI's out there for this?
> >>>
> >>> Thanks for any advice.
> >>>
> >>
>
>
> iCrossing Privileged and Confidential Information
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information of iCrossing. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not the intended recipient, please contact the sender by reply email and
> destroy all copies of the original message.
>
>
>

Reply via email to