hussein-awala commented on PR #32990:
URL: https://github.com/apache/airflow/pull/32990#issuecomment-1689030674
> So, what was the case before this change? IIUC:
> execution_timeout applied to the overall execution of the task (e.g.
duration of a single run of execute)
yes, but when it's reached, we don't execute `on_kill` method
> task deferral timeout is applied to the duration of a single deferral, and
ignores overal execution_timeout
Non, we take the min between deferral timeout and execution_timeout if they
are both present
> if task comes out of deferral then the execution_timeout is applied again
to the task from the moment it resumes exeuction. so that for example if a task
runs, defers, and resumes, and the execution timeout is 1 hr, then the task
could run for 1 hr + deferral timeout + 1hr after resuming.
Non, when we deferred the task, we calculated timeout as:
```python
# Calculate timeout too if it was passed
if defer.timeout is not None:
self.trigger_timeout = timezone.utcnow() + defer.timeout
else:
self.trigger_timeout = None
# If an execution_timeout is set, set the timeout to the minimum of
# it and the trigger timeout
execution_timeout = self.task.execution_timeout
if execution_timeout:
if self.trigger_timeout:
self.trigger_timeout = min(self.start_date +
execution_timeout, self.trigger_timeout)
else:
self.trigger_timeout = self.start_date + execution_timeout
```
And when we resumed the task, we calculated the timeout duration as:
```python
timeout_seconds = (
task_to_execute.execution_timeout - (timezone.utcnow() -
self.start_date)
).total_seconds()
```
The main issue in the operators (not sensors) was with how we handle this
timeout, where it was considered as a normal failure in the trigger.
> it makes me a bit uncomfortable to remove the feature to be able to set a
specific deferral timeout on a given deferral. not that it's necessarily all
that useful, but it's there.
It's still possible, with the possibility to control the callback when this
timeout is reached
> execution_timeout will be applied to the overall duration of the task
instance, no matter how many defer-resume cycles there are.
yes but this was the case before
> if the execution_timeout is 1 hr, then now if phase 1 takes 10 minutes,
the deferral timeout will be set to 50 minutes. then if task is deferred for 30
minutes before resuming, then when task resumes, effective execution timeout
will be 30 minutes. and so on.
20 minutes not 30 (1h - 10m - 30m)
> timeout will be ignored if passed to self.defer
if it's the earliest timeout moment, it will be used, and the
`on_defer_timeout` callback will be called, if not, we'll use execution
timeout, or sensor timeout (the user can define other custom types of timeout)
The most important thing is the sensor timeout; In sync mode, when this
timeout is reached, we fail the task regardless the number of remaining
attempts, and this is not the case in the current async mode. This PR fixes
this bug.
So in summary, this PR:
- process the timeout based on its reason
- if it's because of execution timeout, it fails the attempt and call on_kill
- if it's because of deferring timeout, it call the new callback method,
which by default raise `AirflowDeferTimeout`, but the user can override it to
return values or do anything
- if it's a sensor, and the timeout is because of sensor timeout, it fails
the TI immediately regards the number of remaining attempts
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]