Re: [PR] Add support for cancelling job when deferrable Databricks operator is killed [airflow]

via GitHub Thu, 02 May 2024 16:06:20 -0700


akaul commented on PR #39373:
URL: https://github.com/apache/airflow/pull/39373#issuecomment-2091887128


   > Won't there need to be an update to the operators too, since they 
currently just call the sync cancel_run method? 
([DatabricksSubmitRunOperator](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py#L577)
 | 
[DatabricksRunNowOperator](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py#L875))?
   >
   > I think either that or the hook.cancel_run can accept a deferrable 
parameter which can be used to call a_cancel_run.
   
   I don't believe so since the triggers executes in a completely different 
context than the operators.
   
   The changes here are specifically to account for the following scenario:
   - An `DatabricksSubmitRunOperator`/`DatabricksRunNowOperator` is launched 
with `deferrable=True`
   - After successfully creating a Databricks job, execution is deferred by 
running the `DatabricksExecutionTrigger`
   - A user goes in and manually marks the job as terminated in the UI, 
cancelling the trigger execution when the operator is a deferred state
      - Based on local testing as well as the [explanation in this 
comment](https://github.com/apache/airflow/issues/36090#issuecomment-1854039714),
 the trigger gets terminated and control flow is never transferred back to the 
operator. Due to this `on_kill` is never called.
   
   Currently, terminating the task via the UI doesn't propagate that 
information back to Databricks that the job should be terminated in this 
situation. From my testing locally, the changes in this PR should remedy the 
situation.
   
   I don't know if there's alternative approaches to address this, but this is 
a pretty similar approach as other providers as I noted in the PR description. 
I did initially attempt to implement a 
[Trigger.cleanup()](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/triggers/base/index.html#airflow.triggers.base.BaseTrigger.cleanup)
 method, but that specific function can also be executed within the context of 
a triggerer restart which isn't ideal behaviour since the Databricks job should 
continue running across triggerer restarts, especially for long-running 
Databricks jobs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add support for cancelling job when deferrable Databricks operator is killed [airflow]

Reply via email to