On Thu, Sep 1, 2016 at 3:15 PM, darren <dar...@ontrenet.com> wrote:

> This topic is a concern for us as well. In the data science world no one
> uses native scala or java by choice. It's R and Python. And python is
> growing. Yet in spark, python is 3rd in line for feature support, if at all.
>
> This is why we have decoupled from spark in our project. It's really
> unfortunate spark team have invested so heavily in scale.
>
> As for speed it comes from horizontal scaling and throughout. When you can
> scale outward, individual VM performance is less an issue. Basic HPC
> principles.
>
>
You could still try to get best of the both worlds, having your data
scientists writing their algorithms using Python and/or R and have a
compiler/optimizer handling the optimizations to run in a distributed
fashion in a spark cluster leveraging some of the low level apis written in
java/scala. Take a look at Apache SystemML http://systemml.apache.org/ for
more details.



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Reply via email to