yuqian90 commented on issue #19346:
URL: https://github.com/apache/airflow/issues/19346#issuecomment-957136738
I think the new behavior makes more sense, but I'm happy to hear scenarios
that makes the old behavior more preferable. If timeout=3600s and retries=5, it
means the task has a max of 3600s to finish. During that 3600s, it can retry at
most 5 times. If either 3600s or 5 retries is reached, the task fails. So to
address the problem initially reported in this issue, the user just need to
increase the timeout.
@nathadfield This statement is not true: "if you have a long timeout
setting but the task retries for some other reason, then it will restart with
the timeout reset."
The timeout does not reset after the task fails for some other reason. Let's
say the timeout is 3600s and retries=1. After 600s, the sensor fails due to a
transient connection issues. Then it retries, the timeout will not reset. It
still has exactly 3000s left to finish. If it's not done within 3000s seconds,
it'll fail.
This is so because this piece of code here looks for the time the task made
the first try.
```python
if self.reschedule:
# If reschedule, use the start date of the first try (first try
can be either the very
# first execution of the task, or the first execution after the
task was cleared.)
first_try_number = context['ti'].max_tries - self.retries + 1
task_reschedules = TaskReschedule.find_for_task_instance(
context['ti'], try_number=first_try_number
)
if task_reschedules:
started_at = task_reschedules[0].start_date
else:
started_at = timezone.utcnow()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]