Hello! My name is Russell Jurney. I am a relatively new Airflow user and just joined the group. I am an Azkaban refugee, and an enemy of Oozie and the tyranny of XML.
I wanted to tell you about my new book, out in pre-release, called Agile Data Science 2.0 <http://bit.ly/agile_data_science> (O'Reilly 2017). In the book, we use Airflow in chapter 2, Setup, in a way similar to the Airflow tutorial. Then, in chapter 8, Deploying Predictive Systems, we use Airflow to deploy a predictive system built with PySpark and Spark MLlib. Some highlights in the code at http://github.com/rjurney/Agile_Data_Code_2: - ch02/airflow_test.py <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/airflow_test.py> is a complete Airflow/PySpark tutorial along with ch02/pyspark_task_one.py <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/pyspark_task_one.py> and ch02/pyspark_task_two.py <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch02/pyspark_task_two.py> - The airflow setup for chapter 8 is at ch08/airflow/setup.py <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08/airflow/setup.py> . - The scripts that it operates on are in ch08/ <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08> and show things like how to use '{{ ds }}' and other parameters to hook your scripts into 'airflow backfill' and other features. - ch08/make_predictions.py <https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08/make_predictions.py> shows how to setup a PySpark environment in a script in a way that can work with Airflow. If there is any interest, I would love to present on something like "Building Predictive Systems with Spark and Airflow" at an upcoming Airflow meetup. Thanks! -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io