Hi Spark Devs If you could pick one language binding to add to Spark what would it be? Probably Clojure or JRuby if JVM is of interest.
I'm quite excited about Julia as a language for scientific computing ( http://julialang.org). The Julia community have been very focused on things like interop with R, Matlab, and probably mostly Python (see https://github.com/stevengj/PyCall.jl and https://github.com/stevengj/PyPlot.jl for example). Anyway, this is a bit of a thought experiment but I'd imagine a Julia API would be similar in principle to the Python API. On the Spark Java side, it would likely be almost the same. On the Julia side I'd imagine the major sticking point would be serialisation (eg PyCloud equivalent code). I actually played around with PyCall and was able to call PySpark from the Julia console. You're able to run arbitrary Python PySpark code (though the syntax is a bit ugly) and it seemed to mostly work. However, when I tried to pass in a Julia function or closure, it failed at the serialization step. So one option would be to figure out how to serialize the required things on the Julia side and to use PyCall for interop. This could add a fair bit of overhead Julia <-> Python <-> Java so perhaps not worth it, but still the idea of being able to use Spark for the distributed computing part and to be able to mix n match Python code/libraries and Julia code/libraries for things like stats/machine learning is very appealing! Thoughts? Nick