Hello! My name is Russell Jurney. I am a relatively new Airflow user and
just joined the group. I am an Azkaban refugee, and an enemy of Oozie and
the tyranny of XML.

I wanted to tell you about my new book, out in pre-release, called Agile
Data Science 2.0 <http://bit.ly/agile_data_science> (O'Reilly 2017). In the
book, we use Airflow in chapter 2, Setup, in a way similar to the Airflow
tutorial. Then, in chapter 8, Deploying Predictive Systems, we use Airflow
to deploy a predictive system built with PySpark and Spark MLlib.

Some highlights in the code at http://github.com/rjurney/Agile_Data_Code_2:

   - ch02/airflow_test.py
   
<https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/airflow_test.py>
is
   a complete Airflow/PySpark tutorial along with ch02/pyspark_task_one.py
   
<https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/pyspark_task_one.py>
and
   ch02/pyspark_task_two.py
   
<https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/pyspark_task_two.py>
   - The airflow setup for chapter 8 is at ch08/airflow/setup.py
   
<https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08/airflow/setup.py>
   .
   - The scripts that it operates on are in ch08/
   <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08> and show
   things like how to use '{{ ds }}' and other parameters to hook your scripts
   into 'airflow backfill' and other features.
   - ch08/make_predictions.py
   
<https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08/make_predictions.py>
shows
   how to setup a PySpark environment in a script in a way that can work with
   Airflow.

If there is any interest, I would love to present on something like
"Building Predictive Systems with Spark and Airflow" at an upcoming Airflow
meetup.

Thanks!
-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io

Reply via email to