ferruzzi commented on issue #59303:
URL: https://github.com/apache/airflow/issues/59303#issuecomment-3648219147

   I thought I did a reasonable job separating the interests into SDk and 
public, but I guess not.  Sorry about that.  I have [a PR 
up](https://github.com/apache/airflow/pull/30125) which strips deadlines out of 
the Serialized Dag and I would greatly appreciate it if you held off until that 
was merged so I don't have to redo that work.  But AFAIK that should be the 
last major refactor on the Deadlines code.  I know @ramitkataria  and 
@seanghaeli have Callback work they are actively working on, but I don't know 
offhand how that will be affected by the move.  It might be better for you to 
move ahead and them/us to adapt, but I'll let them weigh in on that hopefully.
   
   
   I think your reorganization plan looks right, but just to be sure, here is 
the workflow after the above PR is merged:
   
   - User defines a DAG() with `deadline=DeadlineAlert()`
   - That gets stored in the DAG object as `deadline: list[DeadlineAlert]`
   - ((Changes start here)):  When the DAG is serialized, each DeadlineAlert is 
written to the deadline_alert table and the serialized_dag stores a list[str] 
containing the table row reference instead of storing the entire serialized 
DeadlineAlert, and only fetched/expanded from the UUID when needed (usually at 
dagrun creation)
   
   I think your only real issue would be that 
task-sdk/src/airflow/sdk/definitions/deadline.py imports `from 
airflow.models.deadline import ReferenceModels` which shouldn't have crossed 
over?  Other than maybe needing to untangle that, I think it's fairly 
straight-forward.  On the SDk side you'd have
   
   ```
   SDK (task-sdk/):
   ├─ deadline.py
      ├─ DeadlineAlert (user-facing, no changes)
      └─ DeadlineReference (public interface, needs to NOT import 
ReferenceModels)
   ```
   
   and on the Core side:
   
   ```
   Core (airflow-core/src/airflow/):
   ├─ serialization/definitions/
      ├─ deadline.py
      │  ├─ SerializedDeadlineAlert 
      │  └─ SerializedDeadlineReference ((I think maybe wrapping 
ReferenceModels here would untangle the cross-referencing??? Maybe???))
      └─ reference_models.py
         └─ Move the DeadlineReference implementation out of 
models/deadline.py?)
   │
   ├─ serialization/ ((new - pull these out of serialized_objects.py))
      ├─ encoders/deadline.py
      └─ decoders/deadline.py
   │
   └─ models/
       ├─ deadline.py (keep Deadline ORM model, remove ReferenceModels)
       └─ deadline_alert.py (keep DeadlineAlert ORM model)
   ```
   
   One thing that may simplify (or maybe complicate?) things after PR 
https://github.com/apache/airflow/pull/58248 is that deserialization will now 
be a 2-step process and we only deserialize "on demand":
   
   - Step 1 - deserialize DAG: `dag.deadline` stays as `list[str]` of UUIDs - 
no Core types needed
   - Step 2 - expand on use: `_process_dagrun_deadline_alerts()` expands UUIDs 
to full objects - uses Core types
   
   I'm not entirely sure how that would look; maybe it'll simplify things for 
you.
   
   
   I think the only other major thing I have in mind for Deadlines was based on 
a comment in another PR that suggested renaming some internal stuff, which I 
guess would make sense after you do your reorganization.  But IFF you think 
it's cleaner to do it when you move them, I was going to internally rename 
`DeadlineAlerts` and related object names to `deadline_def` - Short for 
`deadline definition`.  Currently the user defines the `DeadlineAlert` in their 
Dag definition, the `DeadlineAlert` gets stored in the dag object as `deadline` 
(which is now a list of deadlines, making it even more confusing) and the 
`deadline` is used to calculate a `deadline`.  Naming got pretty confusing and 
it's much more intuitive seeing "The user defines a Deadline Alert, stored as 
`deadline_def`, and that definition is used to calculate a `deadline`".  But as 
I said, that's likely better as an entirely separate project after the move, in 
order to keep the changes easier to follow.
   
   Also, once PR https://github.com/apache/airflow/pull/58248 is merged, I 
think we can finally safely drop `dag._deadline` entirely and that should 
simplify the split I think?  That's likely not part of your split, but maybe  
it could make it easier?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to