> > 1) simply wrap the Spark java API via JavaCall. This is the low level > approach. BTW I've experimented with javaCall and found it was unstable & > also lacking functionality (e.g. there's no way to shutdown the jvm or > create a pool of JVM analogous to DB connections) so that might need some > work before trying the Spark integration. >
Using JavaCall is not an option, especially when JVM became close-sourced, see https://github.com/aviks/JavaCall.jl/issues/7. Python bindings are done through Py4J, which is RPC to JVM. If you look at the sparkR <https://github.com/apache/spark/tree/master/R>, it is done in a same way. sparkR uses a RPC interface to communicate with a Netty-based Spark JVM backend that translates R calls into JVM calls, keeps SparkContext on a JVM side, and ships serialized data to/from R. So it is just a matter of writing Julia RPC to JVM and wrapping necessary Spark methods in a Julia friendly way.
