You can launch one permanent spark context and then execute your jobs within the context. And since they'll be running in the same context, they can share data easily.
These two projects provide the functionality that you need: https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs https://github.com/cloudera/livy#post-sessions On Tue, Jun 20, 2017 at 1:46 PM, Jean Georges Perrin <j...@jgp.net> wrote: > Hey, > > Here is my need: program A does something on a set of data and produces > results, program B does that on another set, and finally, program C > combines the data of A and B. Of course, the easy way is to dump all on > disk after A and B are done, but I wanted to avoid this. > > I was thinking of creating a temp view, but I do not really like the temp > aspect of it ;). Any idea (they are all worth sharing) > > jg > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >