Thanks Daniel for sharing this info.

Regards,
Santosh Karthikeyan

From: Daniel Siegmann [mailto:daniel.siegm...@velos.io]
Sent: Tuesday, July 08, 2014 1:10 AM
To: user@spark.apache.org
Subject: Re: Comparative study

From a development perspective, I vastly prefer Spark to MapReduce. The 
MapReduce API is very constrained; Spark's API feels much more natural to me. 
Testing and local development is also very easy - creating a local Spark 
context is trivial and it reads local files. For your unit tests you can just 
have them create a local context and execute your flow with some test data. 
Even better, you can do ad-hoc work in the Spark shell and if you want that in 
your production code it will look exactly the same.
Unfortunately, the picture isn't so rosy when it gets to production. In my 
experience, Spark simply doesn't scale to the volumes that MapReduce will 
handle. Not with a Standalone cluster anyway - maybe Mesos or YARN would be 
better, but I haven't had the opportunity to try them. I find jobs tend to just 
hang forever for no apparent reason on large data sets (but smaller than what I 
push through MapReduce).
I am hopeful the situation will improve - Spark is developing quickly - but if 
you have large amounts of data you should proceed with caution.
Keep in mind there are some frameworks for Hadoop which can hide the ugly 
MapReduce with something very similar in form to Spark's API; e.g. Apache 
Crunch. So you might consider those as well.
(Note: the above is with Spark 1.0.0.)


On Mon, Jul 7, 2014 at 11:07 AM, 
<santosh.viswanat...@accenture.com<mailto:santosh.viswanat...@accenture.com>> 
wrote:
Hello Experts,

I am doing some comparative study on the below:

Spark vs Impala
Spark vs MapREduce . Is it worth migrating from existing MR implementation to 
Spark?


Please share your thoughts and expertise.


Thanks,
Santosh

________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>



--
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io<mailto:daniel.siegm...@velos.io> W: 
www.velos.io<http://www.velos.io>

Reply via email to