moomindani commented on issue #52280:
URL: https://github.com/apache/airflow/issues/52280#issuecomment-4571822559

   Thanks for the ping @eladkal — looking at this from the Databricks side.
   
   A couple of updates worth folding in before settling on direction:
   
   **1. The "wait for Airflow 3.1" rationale in the issue is largely obsolete**
   
   Timeline-wise, the issue was filed 2025-06 when 3.0 had just shipped. As of 
today, 3.1 (released 2025-09) and 3.2 (2026-04) are both out, and the relevant 
plugin extensibility is in place:
   
   - `fastapi_apps` is available since 3.0 — 
`airflow-core/src/airflow/plugins_manager.py:234`, with working examples in 
`providers/edge3/.../edge_executor_plugin.py`. So the HTTP endpoint piece 
doesn't need to wait.
   - `react_apps` (3.1+) is what enables a richer "pick which tasks to repair" 
UI, if we want it.
   - Auth dependencies are in place: `GetUserDep` and 
`permitted_dag_filter_factory` in 
`airflow-core/src/airflow/api_fastapi/core_api/security.py` cover the 
equivalent of the current FAB view's `auth.has_access_dag("POST", 
DagAccessEntity.RUN)`.
   - The "no direct DB access" rule is for workers / triggerers / DFP, not the 
API server — so the FastAPI handler can talk to the metadata DB normally.
   
   Of the 4 goals listed in the issue, goals 2 (FastAPI endpoint), 3 
(auth/authz), and 4 (defer to 3.1) are essentially resolved by what Airflow has 
shipped since. Only goal 1 (timing mismatch) is still an open design question.
   
   **2. The timing mismatch — three shapes worth comparing**
   
   The pre-computed-at-execution-time vs. needs-runtime-failed-state problem is 
the real architectural challenge. Three shapes I'd consider:
   
   - **A. Server-side resolution (minimal)**: XCom stores a static endpoint URL 
`/databricks/repair/{dag_id}/{run_id}`. The FastAPI handler resolves failed 
Databricks tasks server-side at click time, calls `repair_run`, and clears 
matching Airflow TIs. Same UX as today, implementable on 3.0.
   - **B. Two-step UX (richer)**: XCom stores an external view URL; the view 
fetches failed tasks via FastAPI and lets the user pick which to repair. Closer 
to the Databricks-native repair flow but really wants `react_apps` to look 
right.
   - **C. Operator-level repair**: A `DatabricksRepairFailedOperator` added 
downstream with `trigger_rule="one_failed"` that auto-detects failed Databricks 
tasks and repairs. Sidesteps UI plugins entirely; can co-exist with A or B.
   
   My read: A restores parity for Airflow 3 users with the smallest surface, B 
becomes more attractive once a project commits to a React plugin UI, and C is 
independently useful for the "DAG-native repair" crowd regardless of which UI 
path lands.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to