potiuk commented on issue #24537:
URL: https://github.com/apache/airflow/issues/24537#issuecomment-1159462106

   The serialization only stores dag structure and configuration not python 
code. So basic assumption here does not hold unfortunately because what your 
serialization idea is, it is different that the serialization we have in 
Airflow.
   
   Python DAG is much more than just a single python file - it is the byte code 
of the class but also potentially imported classes and even potentially files 
that are read by the Python code placed in the dag folder and any of the 
subfolders. Potentially loaded in dynamic way by your python code based on any 
factors
   
   There area number of discussions on how to do what you want resulting in a 
number of ideas like DAG Fetcher 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=109445755 or 
DAG Manifest 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest and it 
involves changing the way and limiting how DAGs are being written and 
potentially annotated (and serialized including all the information necessary 
to not only know the DAG structure but also to execute it). 
   
   A lot of the challenges and problems to solve have been very nicely 
described in the recent talk from the Airflew Summit 2022 where AirBnB 
explained how they have done something similar internally and what kind of 
limitations they had to introduce and how much they had to enforce internally 
to make it happen. 
   
   
   https://youtu.be/5Ap2t9qJE18
   
   This is PRECISELY about making Dags submittable via rest API. I recommend 
anyone who wants to take part further in the discussion to watch that talk as 
it explains everything you need to know to understands complexities involved.
   
   Speaking ilof which - once you or anyone else who wishes to discuss it 
watches the talk, think about those issues and understands complexities 
involved, you are free to propose, discuss (on devlist) an AIP for that. Like 
everyone in the community you are free to do it and start devlist discussion. 
   
   Just be prepared that you have to lead and get consensus on how to solves 
those issues in a generic way. People will have different opinions and ideas 
there, and the biggest complexity of the approach will be to get to consensus   
- what trade-offs to make, how to implement serialization, how to handle 
dependencies, how to handle different python versions (pickle!). Most of the 
issues (without ready solutions) are mentioned in the talk.
   
   I personally think it would be great to have it, but it will be months of 
discussions at least and years of implementation in incremental steps to get 
there.
   
   So just be prepared for a marathon not a sprint - whoever wants to lead that 
discussion.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to