Thanks for sharing your slides, Laura! I think I've watched all the airflow
related slides I could find and you did a very good job - adding your
slides to my collection :)  I especially liked how were explaining
execution date concept but I wish you could elaborate on a backfill concept
and running the same dag in parallel (if you guys do this sort of thing) -
I think this the most confusing thing of Airflow that needs good
explanation / examples.

On Mon, Oct 17, 2016 at 5:19 PM, Laura Lorenz <[email protected]>
wrote:

> Same! I actually recently gave a talk about how my company uses airflow at
> PyData DC. The video isn't live yet, but the slides are here
> <http://www.slideshare.net/LauraLorenz4/how-i-learned-to-
> time-travel-or-data-pipelining-and-scheduling-with-airflow>.
> In substance it's actually very similar to what you've written.
>
> I have some airflow-specific ideas about ways to write custom sensors that
> poll job apis (pretty common for us). We do dynamic generation of tasks
> using external metadata by embedding an API call in the DAG definition
> file, which I'm not sure is a best practice or not...
>
> Anyways, if it makes sense to contribute these case studies for
> consideration as a 'best practice', if this is the place or way to do it,
> I'm game. I agree that the resources and thought leadership on ETL design
> is fragmented, and think the Airflow community is fertile ground to provide
> discussion about it.
>
> On Sun, Oct 16, 2016 at 6:40 PM, Boris Tyukin <[email protected]>
> wrote:
>
> > I really look forward to it, Gerard! I've read what you you wrote so far
> > and I really liked it - please keep up the great job!
> >
> > I am hoping to see some best practices for the design of incremental
> loads
> > and using timestamps from source database systems (not being on UTC so
> > still confused about it in Airflow). Also practical use of subdags and
> > dynamic generation of tasks using some external metadata (maybe describe
> in
> > details something similar that wepay did
> > https://wecode.wepay.com/posts/airflow-wepay)
> >
> >
> > On Sun, Oct 16, 2016 at 5:23 PM, Gerard Toonstra <[email protected]>
> > wrote:
> >
> > > Hi all,
> > >
> > > About a year ago, I contributed the HTTPOperator/Sensor and I've been
> > > tracking airflow since. Right now it looks like we're going to adopt
> > > airflow at the company I'm currently working at.
> > >
> > > In preparation for that, I've done a bit of research work how airflow
> > > pipelines should fit together, how important ETL principles are covered
> > and
> > > decided to write this up on a documentation site. The airflow
> > documentation
> > > site contains everything on how all airflow works and the constructs
> that
> > > you have available to build pipelines, but it can still be a challenge
> > for
> > > newcomers to figure out how to put those constructs together to use it
> > > effectively.
> > >
> > > The articles I found online don't go into a lot of detail either.
> Airflow
> > > is built around an important philosophy towards ETL and there's a risk
> > that
> > > newcomers simply pick up a really great tool and start off in the wrong
> > way
> > > when using it.
> > >
> > >
> > > This weekend, I set off to write some documentation to try to fill this
> > > gap. It starts off with a generic understanding of important ETL
> > principles
> > > and I'm currently working on a practical step-by-step example that
> > adheres
> > > to these principles with DAG implementations in airflow; i.e. showing
> how
> > > it can all fit together.
> > >
> > > You can find the current version here:
> > >
> > > https://gtoonstra.github.io/etl-with-airflow/index.html
> > >
> > >
> > > Looking forward to your comments. If you have better ideas how I can
> make
> > > this contribution, don't hesitate to contact me with your suggestions.
> > >
> > > Best regards,
> > >
> > > Gerard
> > >
> >
>

Reply via email to