potiuk commented on PR #54383:
URL: https://github.com/apache/airflow/pull/54383#issuecomment-3227196030

   > I just stumbled across this PR, it's quite a massive change but the 
description has almost no information/justification/explanation. Can someone 
provide some context for the goal here? @uranusjr @kaxil
   
   I have not looked at details (yet) and have not reviewed it in detail (it's 
huge), but let me explain how I understand that change:
   
   
   The whole thing here is that we should get rid of airflow.models.DAG that 
was used to define Dags. During the process of serialisation implementation - 
when we implemented Airflow 2  we implemented SerializedDag that we used in 
places where we only retrieved the  SerializedDag from the serialized form, - 
which was (quoting one of my favourite authors - Douglas Adams) "almost, but 
not quite entirely unlike DAG".  They had different methods and helper methods 
- they were almost the same, but different.
   
   There was some class hierarchy that was supposed to make things easier 
(BaseDag)  but generally speaking it was very difficult to reason when to use 
which. It's always been quite a complex and historically "convoluted" part of 
Airflow code. Not because it was designed like that but because it came from 
some incremental changes that we applied to the original dag (notably Dag 
serialization) that made it overly complex. 
   
   In the process of moving to Task SDK, the Dag definition - the one that is 
used to created Dags by Dag Authors have been moved to TaskSDK. So far, so 
good. However, we were still using the DAG from airflow.models (and the DAG 
from airflow models actually derived from TaskSDK's Dag) because there were 
many, many places where the airlfow.models.DAG has been used and a number of 
methods and properties from airflow.models.DAG should NOT be move the TaskSDK 
Dag because they are simply not needed there. Many of those methods were 
actually only usable in testst, many were only needed for airflow-core 
internals. And our goal is to make TaskSDK as small an lean as possible so that 
we can only expose to Dag Authors what they **really** need.
   
   This is also a step towards complete "airflow-core" and "task.sdk" 
separation. There are stil a few task-sdk -> airflow.models.Dag references that 
are removed in this PR. And the idea ( for Airflow 3.1) is that we only end up 
with airflow-core using task.sdk but with task.sdk NOT using airflow. 
   
   If I understand correctly - after this step is completed - airflow-core 
(scheduler, Triggerer) should generally only use SerializedDag. DagProcessor 
(for the parsing part) will use TaskSDKDag to produce SerializedDag (and store 
it in the database). 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to