potiuk commented on issue #24537: URL: https://github.com/apache/airflow/issues/24537#issuecomment-1159462106
The serialization only stores dag structure and configuration not python code. So basic assumption here does not hold unfortunately because what your serialization idea is, it is different that the serialization we have in Airflow. Python DAG is much more than just a single python file - it is the byte code of the class but also potentially imported classes and even potentially files that are read by the Python code placed in the dag folder and any of the subfolders. Potentially loaded in dynamic way by your python code based on any factors There area number of discussions on how to do what you want resulting in a number of ideas like DAG Fetcher https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=109445755 or DAG Manifest https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest and it involves changing the way and limiting how DAGs are being written and potentially annotated (and serialized including all the information necessary to not only know the DAG structure but also to execute it). A lot of the challenges and problems to solve have been very nicely described in the recent talk from the Airflew Summit 2022 where AirBnB explained how they have done something similar internally and what kind of limitations they had to introduce and how much they had to enforce internally to make it happen. https://youtu.be/5Ap2t9qJE18 This is PRECISELY about making Dags submittable via rest API. I recommend anyone who wants to take part further in the discussion to watch that talk as it explains everything you need to know to understands complexities involved. Speaking ilof which - once you or anyone else who wishes to discuss it watches the talk, think about those issues and understands complexities involved, you are free to propose, discuss (on devlist) an AIP for that. Like everyone in the community you are free to do it and start devlist discussion. Just be prepared that you have to lead and get consensus on how to solves those issues in a generic way. People will have different opinions and ideas there, and the biggest complexity of the approach will be to get to consensus - what trade-offs to make, how to implement serialization, how to handle dependencies, how to handle different python versions (pickle!). Most of the issues (without ready solutions) are mentioned in the talk. I personally think it would be great to have it, but it will be months of discussions at least and years of implementation in incremental steps to get there. So just be prepared for a marathon not a sprint - whoever wants to lead that discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
