Re: Comparative study

Nabeel Memon Mon, 07 Jul 2014 17:06:07 -0700

For Scala API on map/reduce (hadoop engine) there's a library called
"Scalding". It's built on top of Cascading. If you have a huge dataset or
if you consider using map/reduce engine for your job, for any reason, you
can try Scalding.


However, Spark vs Impala doesn't make sense to me. It should've really been
Shark vs Impala. Both are SQL querying engines built on top of Spark and
Hadoop (map/reduce engine) respectively.


On Mon, Jul 7, 2014 at 4:06 PM, <santosh.viswanat...@accenture.com> wrote:

>  Thanks Daniel for sharing this info.
>
>
>
> Regards,
> Santosh Karthikeyan
>
>
>
> *From:* Daniel Siegmann [mailto:daniel.siegm...@velos.io]
> *Sent:* Tuesday, July 08, 2014 1:10 AM
> *To:* user@spark.apache.org
> *Subject:* Re: Comparative study
>
>
>
> From a development perspective, I vastly prefer Spark to MapReduce. The
> MapReduce API is very constrained; Spark's API feels much more natural to
> me. Testing and local development is also very easy - creating a local
> Spark context is trivial and it reads local files. For your unit tests you
> can just have them create a local context and execute your flow with some
> test data. Even better, you can do ad-hoc work in the Spark shell and if
> you want that in your production code it will look exactly the same.
>
> Unfortunately, the picture isn't so rosy when it gets to production. In my
> experience, Spark simply doesn't scale to the volumes that MapReduce will
> handle. Not with a Standalone cluster anyway - maybe Mesos or YARN would be
> better, but I haven't had the opportunity to try them. I find jobs tend to
> just hang forever for no apparent reason on large data sets (but smaller
> than what I push through MapReduce).
>
> I am hopeful the situation will improve - Spark is developing quickly -
> but if you have large amounts of data you should proceed with caution.
>
> Keep in mind there are some frameworks for Hadoop which can hide the ugly
> MapReduce with something very similar in form to Spark's API; e.g. Apache
> Crunch. So you might consider those as well.
>
> (Note: the above is with Spark 1.0.0.)
>
>
>
>
>
> On Mon, Jul 7, 2014 at 11:07 AM, <santosh.viswanat...@accenture.com>
> wrote:
>
> Hello Experts,
>
>
>
> I am doing some comparative study on the below:
>
>
>
> Spark vs Impala
>
> Spark vs MapREduce . Is it worth migrating from existing MR implementation
> to Spark?
>
>
>
>
>
> Please share your thoughts and expertise.
>
>
>
>
>
> Thanks,
> Santosh
>
>
>  ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
> --
>
> Daniel Siegmann, Software Developer
> Velos
>
> Accelerating Machine Learning
>
>
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: daniel.siegm...@velos.io W: www.velos.io
>

Re: Comparative study

Reply via email to