Julia bindings for Spark would provide much more than just RDD, they will give us access to multiple big data components for streaming, machine learning, SQL capabilities and much more.
On Friday, April 17, 2015 at 12:54:32 AM UTC+3, [email protected] wrote: > > However, I wonder, how hard it would be to implement RDD in Julia? It > looks straight forward from a RDD paper > <https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf> how to > implement it. It is a robust abstraction that can be used in any parallel > computation. > > On Thursday, April 16, 2015 at 3:32:32 AM UTC-4, Steven Sagaert wrote: >> >> yes that's a solid approach. For my personal julia - java integrations I >> also run the JVM in a separate process. >> >> On Wednesday, April 15, 2015 at 9:30:28 PM UTC+2, [email protected] wrote: >>> >>> 1) simply wrap the Spark java API via JavaCall. This is the low level >>>> approach. BTW I've experimented with javaCall and found it was unstable & >>>> also lacking functionality (e.g. there's no way to shutdown the jvm or >>>> create a pool of JVM analogous to DB connections) so that might need some >>>> work before trying the Spark integration. >>>> >>> >>> Using JavaCall is not an option, especially when JVM became >>> close-sourced, see https://github.com/aviks/JavaCall.jl/issues/7. >>> >>> Python bindings are done through Py4J, which is RPC to JVM. If you look >>> at the sparkR <https://github.com/apache/spark/tree/master/R>, it is >>> done in a same way. sparkR uses a RPC interface to communicate with a >>> Netty-based Spark JVM backend that translates R calls into JVM calls, keeps >>> SparkContext on a JVM side, and ships serialized data to/from R. >>> >>> So it is just a matter of writing Julia RPC to JVM and wrapping >>> necessary Spark methods in a Julia friendly way. >>> >>
