Thanks Bolke. That's awesome.

1)
So each task would creates its own spark session?
Is there is a way to have spark session sharing like discussed in this
email chain?

2)
Looks like SparkSqlHook calls `spark-sql` shell with all those parameters?

https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_sql_hook.py#L88

This probably will not work in Cloudera's distribution of Spark.
I think they stopped shipping `spark-sql` since CDH 5.4.
`spark-sql` is not included because CDH Spark doesn't have Thift service or
because of some other reason.

Thank you.



-- 
Ruslan Dautkhanov

On Sat, Mar 18, 2017 at 8:24 PM, Bolke de Bruin <[email protected]> wrote:

> A spark operator exists as of 1.8.0 (which will be released tomorrow), you
> might want to take a look at that. I know that an update is coming to that
> operator that improves communication with Yarn.
>
> Bolke
>
> > On 18 Mar 2017, at 18:43, Russell Jurney <[email protected]>
> wrote:
> >
> > Ruslan, thanks for your feedback.
> >
> > You mean the spark-submit context? Or like the SparkContext and
> > SparkSession? I don't think we could keep that alive, because it wouldn't
> > work out with multiple calls to spark-submit. I do feel your pain,
> though.
> > Maybe someone else can see how this might be done?
> >
> > If SparkContext was able to open the spark/pyspark console, then multiple
> > job submissions would be possible. I didn't have this in mind but an
> > InteractiveSparkContext or SparkConsoleContext might be able to do this?
> >
> > Russell Jurney @rjurney <http://twitter.com/rjurney>
> > [email protected] LI <http://linkedin.com/in/russelljurney> FB
> > <http://facebook.com/jurney> datasyndrome.com
> >
> > On Sat, Mar 18, 2017 at 3:02 PM, Ruslan Dautkhanov <[email protected]
> >
> > wrote:
> >
> >> +1 Great idea.
> >>
> >> my two cents - it would be nice (as an option) if SparkOperator would be
> >> able to keep context open between different calls,
> >> as it takes 30+ seconds to create a new context (on our cluster). Not
> sure
> >> how well it fits Airflow architecture.
> >>
> >>
> >>
> >> --
> >> Ruslan Dautkhanov
> >>
> >> On Sat, Mar 18, 2017 at 3:45 PM, Russell Jurney <
> [email protected]>
> >> wrote:
> >>
> >>> What do people think about creating a SparkOperator that uses
> >> spark-submit
> >>> to submit jobs? Would work for Scala/Java Spark and PySpark. The
> patterns
> >>> outlined in my presentation on Airflow and PySpark
> >>> <http://bit.ly/airflow_pyspark> would fit well inside an Operator, I
> >>> think.
> >>> BashOperator works, but why not tailor something to spark-submit?
> >>>
> >>> I'm open to doing the work, but I wanted to see what people though
> about
> >> it
> >>> and get feedback about things they would like to see in SparkOperator
> and
> >>> get any pointers people had to doing the implementation.
> >>>
> >>> Russell Jurney @rjurney <http://twitter.com/rjurney>
> >>> [email protected] LI <http://linkedin.com/in/russelljurney> FB
> >>> <http://facebook.com/jurney> datasyndrome.com
> >>>
> >>
>
>

Reply via email to