[GitHub] [airflow] potiuk commented on issue #28402: Support of reading dags from database by task_runners

GitBox Wed, 04 Jan 2023 03:33:07 -0800


potiuk commented on issue #28402:
URL: https://github.com/apache/airflow/issues/28402#issuecomment-1370815981


   > I thought serialized dag is just a python source code, isn't it?
   
   No. It's not. It's just json serialized DAG structure and metadata. 
   
   What you see in the UI is **just** the source code of the DAG in question - 
but you cannot see there any code it imports. And this is just for "inspection" 
- it's not possible to run this code, because of missing dependent code.
   
   In the current state, we cannot (and should not) serialize Python code to 
the database - simply because a python DAG can import arbitrary number of 
libraries, common code, other dags etc. And in case you have dynamic imports in 
the DAGs or local imports it is extremelly difficult (or actually impossible to 
determine which files should be put in such database).  Effectively what you 
ask for is to store the whole DAG folder as a record in a database for every 
single DAG run.
   
   With the current way how airflow works and how "flexible" Python is, that 
makes no sense - any kind of file sharing does the job much better than trying 
to read the whole DAG folder and convert it in a blob of all DAG files stored 
in a relational database. From performance point of view, it makes no sense. 
   
   Changing this would ba quite a fundamental change in how Airflow works, so 
it definitely does not pass the bar of a "Feature" - it definitely goes into 
the "Airflow Improvement Proposal" camp (so no @hussein-awala - I don't think 
we are going to assign it to anyone as this is definitely not someothing that 
would ever got accepted before we have a proper proposal and discussion about 
it).
   
   There are opened and never completed related Airflow Improvement Proposals 
(https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher, 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest) that 
were aiming at solving that problem.
   
   If you would like to change the behaviour, then the right approach is to 
either pick some of them, complete them (they are in Draft status), be able to 
explain and defend all the different cases and start a discussion about it in 
the Airflow Devlist (see https://airflow.apache.org/community/ for details on 
how to join it). You will need to specify it in the level of detail that will 
allow to asses all the cases, small/big deployments, performance 
considerations, describe different cases. 
   
   Converting it into discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #28402: Support of reading dags from database by task_runners

Reply via email to