jscheffl commented on PR #63498:
URL: https://github.com/apache/airflow/pull/63498#issuecomment-4057449607

   > @jscheffl @dheerajturaga @ferruzzi
   > 
   > Hello! Following the multi-team work, I've been assigned to add 
functionality to the edge executor. As before, I've opened a draft PR first to 
discuss the direction before starting development.
   > 
   > This task involves changing the Deadline Alerts callback work so that it 
can be executed on the worker. The edge_executor is currently structured to 
directly update the task list in the `edge_job` table within the 
`queue_workload` method. It needs to be updated to handle not only the existing 
`workloads.ExecuteTask` but also the newly added 
[`workloads.ExecuteCallback`](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/executors/workloads/callback.py#L71)
 — and I'd like us to align on how to store this class in the DB. I see three 
possible approaches:
   > 
   >     * Fit it into the existing schema without changing the edge_job schema.
   >       This should work functionally, but it would require populating 
existing columns with meaningless values (dag_id, task_id, run_id).
   > 
   >     * Modify the edge_job schema.
   >       By adding a workload_type field, we can implicitly distinguish the 
expected schema per value. This could also be an opportunity to make changes to 
the PK and nullable constraints.
   > 
   >     * Add a separate edge_callback_job table.
   >       Fully separating the tables would guarantee clear and precise 
consistency.
   > 
   > 
   > All three approaches could work without any functional issues. I'd love to 
hear your thoughts.
   
   Thanks for raising a short check. Actually have not thought about this and 
have no strong opinion developed.
   
   An callback also is belonging to a Dag, Task, correct? So these details 
still could be used to populate details?
   
   Note that the table structure and PK is originating from the time of Airflow 
2 before a Task had a UUID. Therefore Dag_id, Task_id and Run_id were primary 
keys as it was the job PK. I matter of change I'd be OK to change the DB layout 
and switch to Task instance UUID or some other artificial UUID would also be OK 
and keeping the existing fields informational. In this case if these are not PK 
anymore you could fill will empty or a placeholder if not relevant/existing. 
But this would also impact REST API services, maybe some adjustment there are 
also needed (e.g. /jobs/state/...) as they carry the existing UUID.
   
   Note that Core and worker might a bit diverging so if REST API is changed at 
least the existing endpoints need to be kept for compatibility.
   
   If technically not tooo much overhead I'd slightly prefer a single table for 
all, then only 1 query is needed to poll for jobs.
   
   Other opinions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to