Hi all, About a year ago, I contributed the HTTPOperator/Sensor and I've been tracking airflow since. Right now it looks like we're going to adopt airflow at the company I'm currently working at.
In preparation for that, I've done a bit of research work how airflow pipelines should fit together, how important ETL principles are covered and decided to write this up on a documentation site. The airflow documentation site contains everything on how all airflow works and the constructs that you have available to build pipelines, but it can still be a challenge for newcomers to figure out how to put those constructs together to use it effectively. The articles I found online don't go into a lot of detail either. Airflow is built around an important philosophy towards ETL and there's a risk that newcomers simply pick up a really great tool and start off in the wrong way when using it. This weekend, I set off to write some documentation to try to fill this gap. It starts off with a generic understanding of important ETL principles and I'm currently working on a practical step-by-step example that adheres to these principles with DAG implementations in airflow; i.e. showing how it can all fit together. You can find the current version here: https://gtoonstra.github.io/etl-with-airflow/index.html Looking forward to your comments. If you have better ideas how I can make this contribution, don't hesitate to contact me with your suggestions. Best regards, Gerard