Ruslan, thanks for your feedback. You mean the spark-submit context? Or like the SparkContext and SparkSession? I don't think we could keep that alive, because it wouldn't work out with multiple calls to spark-submit. I do feel your pain, though. Maybe someone else can see how this might be done?
If SparkContext was able to open the spark/pyspark console, then multiple job submissions would be possible. I didn't have this in mind but an InteractiveSparkContext or SparkConsoleContext might be able to do this? Russell Jurney @rjurney <http://twitter.com/rjurney> [email protected] LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com On Sat, Mar 18, 2017 at 3:02 PM, Ruslan Dautkhanov <[email protected]> wrote: > +1 Great idea. > > my two cents - it would be nice (as an option) if SparkOperator would be > able to keep context open between different calls, > as it takes 30+ seconds to create a new context (on our cluster). Not sure > how well it fits Airflow architecture. > > > > -- > Ruslan Dautkhanov > > On Sat, Mar 18, 2017 at 3:45 PM, Russell Jurney <[email protected]> > wrote: > > > What do people think about creating a SparkOperator that uses > spark-submit > > to submit jobs? Would work for Scala/Java Spark and PySpark. The patterns > > outlined in my presentation on Airflow and PySpark > > <http://bit.ly/airflow_pyspark> would fit well inside an Operator, I > > think. > > BashOperator works, but why not tailor something to spark-submit? > > > > I'm open to doing the work, but I wanted to see what people though about > it > > and get feedback about things they would like to see in SparkOperator and > > get any pointers people had to doing the implementation. > > > > Russell Jurney @rjurney <http://twitter.com/rjurney> > > [email protected] LI <http://linkedin.com/in/russelljurney> FB > > <http://facebook.com/jurney> datasyndrome.com > > >
