On Thu, Sep 1, 2016 at 3:15 PM, darren <dar...@ontrenet.com> wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > > This is why we have decoupled from spark in our project. It's really > unfortunate spark team have invested so heavily in scale. > > As for speed it comes from horizontal scaling and throughout. When you can > scale outward, individual VM performance is less an issue. Basic HPC > principles. > > You could still try to get best of the both worlds, having your data scientists writing their algorithms using Python and/or R and have a compiler/optimizer handling the optimizations to run in a distributed fashion in a spark cluster leveraging some of the low level apis written in java/scala. Take a look at Apache SystemML http://systemml.apache.org/ for more details.
-- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/