Thanks for sharing your slides, Laura! I think I've watched all the airflow related slides I could find and you did a very good job - adding your slides to my collection :) I especially liked how were explaining execution date concept but I wish you could elaborate on a backfill concept and running the same dag in parallel (if you guys do this sort of thing) - I think this the most confusing thing of Airflow that needs good explanation / examples.
On Mon, Oct 17, 2016 at 5:19 PM, Laura Lorenz <[email protected]> wrote: > Same! I actually recently gave a talk about how my company uses airflow at > PyData DC. The video isn't live yet, but the slides are here > <http://www.slideshare.net/LauraLorenz4/how-i-learned-to- > time-travel-or-data-pipelining-and-scheduling-with-airflow>. > In substance it's actually very similar to what you've written. > > I have some airflow-specific ideas about ways to write custom sensors that > poll job apis (pretty common for us). We do dynamic generation of tasks > using external metadata by embedding an API call in the DAG definition > file, which I'm not sure is a best practice or not... > > Anyways, if it makes sense to contribute these case studies for > consideration as a 'best practice', if this is the place or way to do it, > I'm game. I agree that the resources and thought leadership on ETL design > is fragmented, and think the Airflow community is fertile ground to provide > discussion about it. > > On Sun, Oct 16, 2016 at 6:40 PM, Boris Tyukin <[email protected]> > wrote: > > > I really look forward to it, Gerard! I've read what you you wrote so far > > and I really liked it - please keep up the great job! > > > > I am hoping to see some best practices for the design of incremental > loads > > and using timestamps from source database systems (not being on UTC so > > still confused about it in Airflow). Also practical use of subdags and > > dynamic generation of tasks using some external metadata (maybe describe > in > > details something similar that wepay did > > https://wecode.wepay.com/posts/airflow-wepay) > > > > > > On Sun, Oct 16, 2016 at 5:23 PM, Gerard Toonstra <[email protected]> > > wrote: > > > > > Hi all, > > > > > > About a year ago, I contributed the HTTPOperator/Sensor and I've been > > > tracking airflow since. Right now it looks like we're going to adopt > > > airflow at the company I'm currently working at. > > > > > > In preparation for that, I've done a bit of research work how airflow > > > pipelines should fit together, how important ETL principles are covered > > and > > > decided to write this up on a documentation site. The airflow > > documentation > > > site contains everything on how all airflow works and the constructs > that > > > you have available to build pipelines, but it can still be a challenge > > for > > > newcomers to figure out how to put those constructs together to use it > > > effectively. > > > > > > The articles I found online don't go into a lot of detail either. > Airflow > > > is built around an important philosophy towards ETL and there's a risk > > that > > > newcomers simply pick up a really great tool and start off in the wrong > > way > > > when using it. > > > > > > > > > This weekend, I set off to write some documentation to try to fill this > > > gap. It starts off with a generic understanding of important ETL > > principles > > > and I'm currently working on a practical step-by-step example that > > adheres > > > to these principles with DAG implementations in airflow; i.e. showing > how > > > it can all fit together. > > > > > > You can find the current version here: > > > > > > https://gtoonstra.github.io/etl-with-airflow/index.html > > > > > > > > > Looking forward to your comments. If you have better ideas how I can > make > > > this contribution, don't hesitate to contact me with your suggestions. > > > > > > Best regards, > > > > > > Gerard > > > > > >
