amoghrajesh commented on issue #59303:
URL: https://github.com/apache/airflow/issues/59303#issuecomment-3713418915

   > I thought I did a reasonable job separating the interests into SDk and 
public, but I guess not. Sorry about that. I have [a PR 
up](https://github.com/apache/airflow/pull/30125) which strips deadlines out of 
the Serialized Dag and I would greatly appreciate it if you held off until that 
was merged so I don't have to redo that work. But AFAIK that should be the last 
major refactor on the Deadlines code. I know 
[@ramitkataria](https://github.com/ramitkataria) and 
[@seanghaeli](https://github.com/seanghaeli) have Callback work they are 
actively working on, but I don't know offhand how that will be affected by the 
move. It might be better for you to move ahead and them/us to adapt, but I'll 
let them weigh in on that hopefully.
   > 
   > I think your reorganization plan looks right, but just to be sure, here is 
the workflow after the above PR is merged:
   > 
   > * User defines a DAG() with `deadline=DeadlineAlert()`
   > * That gets stored in the DAG object as `deadline: list[DeadlineAlert]`
   > * ((Changes start here)):  When the DAG is serialized, each DeadlineAlert 
is written to the deadline_alert table and the serialized_dag stores a 
list[str] containing the table row reference instead of storing the entire 
serialized DeadlineAlert, and only fetched/expanded from the UUID when needed 
(usually at dagrun creation)
   > 
   > I think your only real issue would be that 
task-sdk/src/airflow/sdk/definitions/deadline.py imports `from 
airflow.models.deadline import ReferenceModels` which shouldn't have crossed 
over? Other than maybe needing to untangle that, I think it's fairly 
straight-forward. On the SDk side you'd have
   > 
   > ```
   > SDK (task-sdk/):
   > ├─ deadline.py
   >    ├─ DeadlineAlert (user-facing, no changes)
   >    └─ DeadlineReference (public interface, needs to NOT import 
ReferenceModels)
   > ```
   > 
   > and on the Core side:
   > 
   > ```
   > Core (airflow-core/src/airflow/):
   > ├─ serialization/definitions/
   >    ├─ deadline.py
   >    │  ├─ SerializedDeadlineAlert 
   >    │  └─ SerializedDeadlineReference ((I think maybe wrapping 
ReferenceModels here would untangle the cross-referencing??? Maybe???))
   >    └─ reference_models.py
   >       └─ Move the DeadlineReference implementation out of 
models/deadline.py?)
   > │
   > ├─ serialization/ ((new - pull these out of serialized_objects.py))
   >    ├─ encoders/deadline.py
   >    └─ decoders/deadline.py
   > │
   > └─ models/
   >     ├─ deadline.py (keep Deadline ORM model, remove ReferenceModels)
   >     └─ deadline_alert.py (keep DeadlineAlert ORM model)
   > ```
   > 
   > One thing that may simplify (or maybe complicate?) things after PR 
[#58248](https://github.com/apache/airflow/pull/58248) is that deserialization 
will now be a 2-step process and we only deserialize "on demand":
   > 
   > * Step 1 - deserialize DAG: `dag.deadline` stays as `list[str]` of UUIDs - 
no Core types needed
   > * Step 2 - expand on use: `_process_dagrun_deadline_alerts()` expands 
UUIDs to full objects - uses Core types
   > 
   > I'm not entirely sure how that would look; maybe it'll simplify things for 
you.
   > 
   > I think the only other major thing I have in mind for Deadlines was based 
on a comment in another PR that suggested renaming some internal stuff, which I 
guess would make sense after you do your reorganization. But IFF you think it's 
cleaner to do it when you move them, I was going to internally rename 
`DeadlineAlerts` and related object names to `deadline_def` - Short for 
`deadline definition`. Currently the user defines the `DeadlineAlert` in their 
Dag definition, the `DeadlineAlert` gets stored in the dag object as `deadline` 
(which is now a list of deadlines, making it even more confusing) and the 
`deadline` is used to calculate a `deadline`. Naming got pretty confusing and 
it's much more intuitive seeing "The user defines a Deadline Alert, stored as 
`deadline_def`, and that definition is used to calculate a `deadline`". But as 
I said, that's likely better as an entirely separate project after the move, in 
order to keep the changes easier to follow.
   > 
   > Also, once PR [#58248](https://github.com/apache/airflow/pull/58248) is 
merged, I think we can finally safely drop `dag._deadline` entirely and that 
should simplify the split I think? That's likely not part of your split, but 
maybe it could make it easier?
   
   @ferruzzi I saw it in this comment, maybe it was added by mistake?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to