wolfier commented on issue #56196:
URL: https://github.com/apache/airflow/issues/56196#issuecomment-3359430986

   > What would you expect to happen in this case?
   
   If the task instance in the above case fails, the task state will move 
directly to `failed` and when 
[finalized](https://github.com/apache/airflow/blob/3.1.0/task-sdk/src/airflow/sdk/execution_time/task_runner.py#L1452)
 the `on_failure_callback` will be executed.
   
   This is not the situation described in the issue as this is the code path 
where the task execution command completed successfully. The original issue is 
when the scheduler handles the failure of an externally killed task instance.
   
   > there are 3 retries but no retries will ever be invoked. Do you expect 
on_retry_callback to be called?
   
   I do not expect `on_retry_callback` to be invoked.
   
   ---
   
   There are two distinct code paths.
   
   * The task execution command fails unexpectedly which is generally 
considered as the task instance killed externally. This leads to the 
**scheduler** sending the `on_failure_callback` to the dag processor to be 
invoked.
     * If `on_failure_callback` and `on_retry_callback` is set, the 
`on_failure_callback` is invoked.
     * If `on_failure_callback` is set and `on_retry_callback` is not set, the 
`on_failure_callback` is invoked.
     * If `on_failure_callback` is not set and `on_retry_callback` is set, a 
[warning](https://github.com/apache/airflow/blob/3.1.0/airflow-core/src/airflow/dag_processing/processor.py#L309-L322)
 is printed. (I believe)
   * The task execution command completes successfully and the execution 
results in an exception. This leads to the **worker** [resolving the task 
instance 
state](https://github.com/apache/airflow/blob/3.1.0/task-sdk/src/airflow/sdk/execution_time/task_runner.py#L1450)
 and [finalizing the task 
instance](https://github.com/apache/airflow/blob/3.1.0/task-sdk/src/airflow/sdk/execution_time/task_runner.py#L1452)
 which includes executing callbacks.
   
   Currently in Airflow 3.1, when the task execution command fails 
unexpectedly, the `on_failure_callback` is invoked. If the task instance is 
eligible for retry, the task will be retried. I believe the behaviour should be 
the `on_retry_callback` is executed if present otherwise no callback is invoked 
since the task did enter the `failed` state.
   
   > You describe a case where the task is killed. Killed = task goes to 
failure state which means that on_failure_callback must be executed so the 
issue here is not if on_failure_callback should be executed - the answer to 
that is always yes. 
   
   To clarify, it is only the **instance** of task execution that is "killed". 
I used the term "killed externally" because that is how Airflow describes the 
situation. The task instance can be scheduled and queued again if it is 
eligible for retry. Additionally, the task instance never reached the `failed` 
state. Since the task instance is eligible for retry, the task instance state 
is set to `up_for_retry`.
   
   > The issue here is if on_retry_callback should be called as well.
   
   I think only the `on_retry_callback` should be executed if the task is 
eligible for retries. If the task is not eligible for retires, the current 
behaviour is sufficient thought the implied `on_failure_callback` execution can 
be confusing.
   
   ```python
       request = TaskCallbackRequest(
           filepath=ti.dag_model.relative_fileloc,
           bundle_name=ti.dag_version.bundle_name,
           bundle_version=ti.dag_version.bundle_version,
           ti=ti,
           msg=msg,
           context_from_server=TIRunContext(
               dag_run=DRDataModel.model_validate(ti.dag_run, 
from_attributes=True),
               max_tries=ti.max_tries,
               variables=[],
               connections=[],
               xcom_keys_to_clear=[],
           ),
           task_callback_type=TaskInstanceState.FAILED,   # Explicitly denote 
the task callback type
       )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to