Hi all,

About a year ago, I contributed the HTTPOperator/Sensor and I've been
tracking airflow since. Right now it looks like we're going to adopt
airflow at the company I'm currently working at.

In preparation for that, I've done a bit of research work how airflow
pipelines should fit together, how important ETL principles are covered and
decided to write this up on a documentation site. The airflow documentation
site contains everything on how all airflow works and the constructs that
you have available to build pipelines, but it can still be a challenge for
newcomers to figure out how to put those constructs together to use it
effectively.

The articles I found online don't go into a lot of detail either. Airflow
is built around an important philosophy towards ETL and there's a risk that
newcomers simply pick up a really great tool and start off in the wrong way
when using it.


This weekend, I set off to write some documentation to try to fill this
gap. It starts off with a generic understanding of important ETL principles
and I'm currently working on a practical step-by-step example that adheres
to these principles with DAG implementations in airflow; i.e. showing how
it can all fit together.

You can find the current version here:

https://gtoonstra.github.io/etl-with-airflow/index.html


Looking forward to your comments. If you have better ideas how I can make
this contribution, don't hesitate to contact me with your suggestions.

Best regards,

Gerard

Reply via email to