Daniel, 

Do you mind sharing the size of your cluster and the production data volumes ? 

Thanks
Soumya 

> On Jul 7, 2014, at 3:39 PM, Daniel Siegmann <daniel.siegm...@velos.io> wrote:
> 
> From a development perspective, I vastly prefer Spark to MapReduce. The 
> MapReduce API is very constrained; Spark's API feels much more natural to me. 
> Testing and local development is also very easy - creating a local Spark 
> context is trivial and it reads local files. For your unit tests you can just 
> have them create a local context and execute your flow with some test data. 
> Even better, you can do ad-hoc work in the Spark shell and if you want that 
> in your production code it will look exactly the same.
> 
> Unfortunately, the picture isn't so rosy when it gets to production. In my 
> experience, Spark simply doesn't scale to the volumes that MapReduce will 
> handle. Not with a Standalone cluster anyway - maybe Mesos or YARN would be 
> better, but I haven't had the opportunity to try them. I find jobs tend to 
> just hang forever for no apparent reason on large data sets (but smaller than 
> what I push through MapReduce).
> 
> I am hopeful the situation will improve - Spark is developing quickly - but 
> if you have large amounts of data you should proceed with caution.
> 
> Keep in mind there are some frameworks for Hadoop which can hide the ugly 
> MapReduce with something very similar in form to Spark's API; e.g. Apache 
> Crunch. So you might consider those as well.
> 
> (Note: the above is with Spark 1.0.0.)
> 
> 
> 
>> On Mon, Jul 7, 2014 at 11:07 AM, <santosh.viswanat...@accenture.com> wrote:
>> Hello Experts,
>> 
>>  
>> 
>> I am doing some comparative study on the below:
>> 
>>  
>> 
>> Spark vs Impala
>> 
>> Spark vs MapREduce . Is it worth migrating from existing MR implementation 
>> to Spark?
>> 
>>  
>> 
>>  
>> 
>> Please share your thoughts and expertise.
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Santosh
>> 
>> 
>> 
>> This message is for the designated recipient only and may contain 
>> privileged, proprietary, or otherwise confidential information. If you have 
>> received it in error, please notify the sender immediately and delete the 
>> original. Any other use of the e-mail by you is prohibited. Where allowed by 
>> local law, electronic communications with Accenture and its affiliates, 
>> including e-mail and instant messaging (including content), may be scanned 
>> by our systems for the purposes of information security and assessment of 
>> internal compliance with Accenture policy. 
>> ______________________________________________________________________________________
>> 
>> www.accenture.com
> 
> 
> 
> -- 
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
> 
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: daniel.siegm...@velos.io W: www.velos.io

Reply via email to