datapythonista opened a new issue #11208:
URL: https://github.com/apache/airflow/issues/11208


   **Description**
   
   I'm new to Airflow, and it was very easy to get started creating a simple 
DAG. Thanks for the tutorial and other beginner docs. But since getting the 
"hello world" kind of DAG in the tutorial (with independent tasks that don't 
manage data), I'm having a hard time understanding how Airflow is expected to 
be used when working with data pipelines.
   
   As an example, I could like to implement a task that performs the next:
   1. Download a CSV file from a url (via http/https)
   2. Query a database (PostgreSQL for example) and save the data locally (in 
CSV format)
   3. Concatenate the two previous files
   
   Seems to me (maybe I'm wrong) that 1 and 2 should be quite standard, and I 
that I shouldn't be implementing custom operators for common things, given the 
long list in operators in 
[airflow.operators](https://airflow.apache.org/docs/stable/_api/airflow/operators/index.html)
 and 
[airflow.contrib.operators](https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/index.html).
 But I couldn't find operators for them, so I guess that's the case.
   
   I've been checking many of the documentation pages, Concepts, How-to 
guides... And I get into an ocean of advanced concepts, where I'm still trying 
to understand the basics, and something that IMHO should be quite common.
   
   **Use case / motivation**
   
   I think for beginners like me, Airflow should provide more guidance in the 
tutorial for building data pipelines (which in my understanding is one of the 
main use cases), before expecting visitors of the docs to navigate all the more 
specific and advanced topics.
   
   In particular, to provide context, and let users know if they should be 
mostly creating operators for most of their tasks (and probably how), or what 
building a real data pipelines involves.
   
   I think dividing the tutorial in two parts could be a way to achieve this:
   1. Writing a "hello world" pipeline (the current tutorial)
   2. Writing a data pipeline in Airflow (how to implement something like the 
three point example I mentioned previously)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to