I'm working on a wrapper for Apache Spark in Julia. Essentially, workflow 
looks like this: 

 1. Julia driver creates instance of `JuliaRDD` class in JVM and passes 
serialized Julia function to it. 
 2. Spark core copies JuliaRDD to each machine in a cluster and runs its 
`.compute()` method.
 3. `JuliaRDD.compute()` starts new Julia process and invokes function 
`launch_worker`.
 4. Launched worker reads and deserializes original function and applies it 
to a local chunk of data. 

So workers are not managed by any kind of Julia's `ClusterManager` and in 
general know nothing about definitions in the main driver program. The only 
2 pieces of information they have are serialized function and data to 
process. 

My question is: does Julia's serialization produce completely 
self-containing code that can be run on workers? In other words, is it 
possible to send serialized function over network to another host / Julia 
process and applied there without any additional information from the first 
process? 

I made some tests on a single machine, and when I defined function without 
`@everywhere`, worker failed with a message "function myfunc not defined on 
process 1". With `@everywhere`, my code worked, but will it work on 
multiple hosts with essentially independent Julia processes? 


Reply via email to