Re: Hadoop/Elastic MR on AWS

James Seigel Mon, 27 Dec 2010 11:04:25 -0800

Thank you for sharing.

Sent from my mobile. Please excuse the typos.


On 2010-12-27, at 11:18 AM, Sudhir Vallamkondu
<[email protected]> wrote:

> We recently crossed this bridge and here are some insights. We did an
> extensive study comparing costs and benchmarking local vs EMR for our
> current needs and future trend.
>
> - Scalability you get with EMR is unmatched although you need to look at
> your requirement and decide this is something you need.
>
> - When using EMR its cheaper to use reserved instances vs nodes on the fly.
> You can always add more nodes when required. I suggest looking at your
> current computing needs and reserve instances for a year or two and use
> these to run EMR and add nodes at peak needs. In your cost estimation you
> will need to factor in the data transfer time/costs unless you are dealing
> with public datasets on S3
>
> - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
> benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
> benchmark). For IO intensive jobs you will need to add more nodes to
> compensate this.
>
> - When compared to local cluster, you will need to factor the time it takes
> for the EMR cluster to setup when starting a job. This like data transfer
> time, cluster replication time etc
>
> - EMR API is very flexible however you will need to build a custom interface
> on top of it to suit your job management and monitoring needs
>
> - EMR bootstrap actions can satisfy most of your native lib needs so no
> drawbacks there.
>
>
> -- Sudhir
>
>
> On 12/26/10 5:26 AM, "[email protected]"
> <[email protected]> wrote:
>
>> From: Otis Gospodnetic <[email protected]>
>> Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
>> To: <[email protected]>
>> Subject: Re: Hadoop/Elastic MR on AWS
>>
>> Hello Amandeep,
>>
>>
>>
>> ----- Original Message ----
>>> From: Amandeep Khurana <[email protected]>
>>> To: [email protected]
>>> Sent: Fri, December 10, 2010 1:14:45 AM
>>> Subject: Re: Hadoop/Elastic MR on AWS
>>>
>>> Mark,
>>>
>>> Using EMR makes it very easy to start a cluster and add/reduce  capacity as
>>> and when required. There are certain optimizations that make EMR  an
>>> attractive choice as compared to building your own cluster out. Using  EMR
>>
>>
>> Could you please point out what optimizations you are referring to?
>>
>> Thanks,
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>> also ensures you are using a production quality, stable system backed by  
>>> the
>>> EMR engineers. You can always use bootstrap actions to put your own  tweaked
>>> version of Hadoop in there if you want to do that.
>>>
>>> Also, you  don't have to tear down your cluster after every job. You can set
>>> the alive  option when you start your cluster and it will stay there even
>>> after your  Hadoop job completes.
>>>
>>> If you face any issues with EMR, send me a mail  offline and I'll be happy 
>>> to
>>> help.
>>>
>>> -Amandeep
>>>
>>>
>>> On Thu, Dec 9,  2010 at 9:47 PM, Mark <[email protected]>  wrote:
>>>
>>>> Does anyone have any thoughts/experiences on running Hadoop  in AWS? What
>>>> are some pros/cons?
>>>>
>>>> Are there any good  AMI's out there for this?
>>>>
>>>> Thanks for any advice.
>>>>
>>>
>
>
> iCrossing Privileged and Confidential Information
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential and privileged information of iCrossing. Any 
> unauthorized review, use, disclosure or distribution is prohibited. If you 
> are not the intended recipient, please contact the sender by reply email and 
> destroy all copies of the original message.
>
>

Re: Hadoop/Elastic MR on AWS

Reply via email to