SameerMesiah97 opened a new pull request, #62422:
URL: https://github.com/apache/airflow/pull/62422

   **Description**
   
   This change makes `DatabricksReposCreateOperator` resilient to a race 
condition when multiple tasks attempt to create a repository at the same 
`repo_path` concurrently.
   
   Previously, the operator performed a `get_repo_by_path` check followed by 
`create_repo`. If two tasks ran at the same time, both could observe that the 
repository did not exist and both attempt creation. One task would succeed, 
while the other would fail with a 400 error from the Databricks API indicating 
that the repo already exists.
   
   The operator now treats this as a recoverable condition. If `create_repo` 
fails because the repo already exists, the operator re-fetches the repo ID via 
`get_repo_by_path`. If the repo is found, execution proceeds normally and 
preserves the existing `ignore_existing_repo` semantics.
   
   **Rationale**
   
   The previous implementation relied on a non-atomic existence check followed 
by creation. In concurrent DAG runs, this leads to a classic 
time-of-check/time-of-use race condition. Two tasks can both pass the existence 
check and attempt creation, even though only one creation can succeed.
   
   Since repository creation is an external side-effect managed by the 
Databricks API, the operator cannot assume exclusivity or single-writer 
behavior. It must defensively handle the possibility that another task or DAG 
run creates the resource between the check and the create call. Handling this 
explicitly makes the operator more robust under concurrency without changing 
its single-run behavior.
   
   **Tests**
   
   Add tests that verify that:
   
   * the operator recovers when `create_repo` raises an “already exists” error 
by re-fetching the repo ID and proceeding successfully.
   * a genuine creation failure (where the repo still cannot be found after the 
error) is propagated and not silently swallowed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to