However, I wonder, how hard it would be to implement RDD in Julia? It looks 
straight forward from a RDD paper 
<https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf> how to 
implement it. It is a robust abstraction that can be used in any parallel 
computation.

On Thursday, April 16, 2015 at 3:32:32 AM UTC-4, Steven Sagaert wrote:
>
> yes that's a solid approach. For my personal julia - java integrations I 
> also run the JVM in a separate process.
>
> On Wednesday, April 15, 2015 at 9:30:28 PM UTC+2, [email protected] wrote:
>>
>> 1) simply wrap the Spark java API via JavaCall. This is the low level 
>>> approach. BTW I've experimented with javaCall and found it was unstable & 
>>> also lacking functionality (e.g. there's no way to shutdown the jvm or 
>>> create a pool of JVM analogous to DB connections) so that might need some 
>>> work before trying the Spark integration.
>>>
>>
>> Using JavaCall is not an option, especially when JVM became 
>> close-sourced, see https://github.com/aviks/JavaCall.jl/issues/7.
>>
>> Python bindings are done through Py4J, which is RPC to JVM. If you look 
>> at the sparkR <https://github.com/apache/spark/tree/master/R>, it is 
>> done in a same way. sparkR uses a RPC interface to communicate with a 
>> Netty-based Spark JVM backend that translates R calls into JVM calls, keeps 
>> SparkContext on a JVM side, and ships serialized data to/from R.
>>
>> So it is just a matter of writing Julia RPC to JVM and wrapping necessary 
>> Spark methods in a Julia friendly way. 
>>
>

Reply via email to