+1 Great idea. my two cents - it would be nice (as an option) if SparkOperator would be able to keep context open between different calls, as it takes 30+ seconds to create a new context (on our cluster). Not sure how well it fits Airflow architecture.
-- Ruslan Dautkhanov On Sat, Mar 18, 2017 at 3:45 PM, Russell Jurney <[email protected]> wrote: > What do people think about creating a SparkOperator that uses spark-submit > to submit jobs? Would work for Scala/Java Spark and PySpark. The patterns > outlined in my presentation on Airflow and PySpark > <http://bit.ly/airflow_pyspark> would fit well inside an Operator, I > think. > BashOperator works, but why not tailor something to spark-submit? > > I'm open to doing the work, but I wanted to see what people though about it > and get feedback about things they would like to see in SparkOperator and > get any pointers people had to doing the implementation. > > Russell Jurney @rjurney <http://twitter.com/rjurney> > [email protected] LI <http://linkedin.com/in/russelljurney> FB > <http://facebook.com/jurney> datasyndrome.com >
