Hi Spark Devs

If you could pick one language binding to add to Spark what would it be?
Probably Clojure or JRuby if JVM is of interest.

I'm quite excited about Julia as a language for scientific computing (
http://julialang.org). The Julia community have been very focused on things
like interop with R, Matlab, and probably mostly Python (see
https://github.com/stevengj/PyCall.jl and
https://github.com/stevengj/PyPlot.jl for example).

Anyway, this is a bit of a thought experiment but I'd imagine a Julia API
would be similar in principle to the Python API. On the Spark Java side, it
would likely be almost the same. On the Julia side I'd imagine the major
sticking point would be serialisation (eg PyCloud equivalent code).

I actually played around with PyCall and was able to call PySpark from the
Julia console. You're able to run arbitrary Python PySpark code (though the
syntax is a bit ugly) and it seemed to mostly work.

However, when I tried to pass in a Julia function or closure, it failed at
the serialization step.

So one option would be to figure out how to serialize the required things
on the Julia side and to use PyCall for interop. This could add a fair bit
of overhead Julia <-> Python <-> Java so perhaps not worth it, but still
the idea of being able to use Spark for the distributed computing part and
to be able to mix n match Python code/libraries and Julia code/libraries
for things like stats/machine learning is very appealing!

Thoughts?

Nick

Reply via email to