You can launch one permanent spark context and then execute your jobs
within the context. And since they'll be running in the same context, they
can share data easily.

These two projects provide the functionality that you need:
https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs
https://github.com/cloudera/livy#post-sessions

On Tue, Jun 20, 2017 at 1:46 PM, Jean Georges Perrin <j...@jgp.net> wrote:

> Hey,
>
> Here is my need: program A does something on a set of data and produces
> results, program B does that on another set, and finally, program C
> combines the data of A and B. Of course, the easy way is to dump all on
> disk after A and B are done, but I wanted to avoid this.
>
> I was thinking of creating a temp view, but I do not really like the temp
> aspect of it ;). Any idea (they are all worth sharing)
>
> jg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to