YoannAbriel opened a new pull request, #62869:
URL: https://github.com/apache/airflow/pull/62869

   # fix: warn about hardcoded 24h visibility_timeout that kills long-running 
Celery tasks
   
   ## Problem
   
   When using the Celery executor with Redis/SQS brokers and no explicit 
`visibility_timeout` configured, Airflow silently applies a default of 86400 
seconds (24 hours) in `default_celery.py`. Tasks running longer than 24 hours 
are terminated by the broker redelivering the message to another worker, which 
fails with `ServerResponseError('Invalid auth token: Signature has expired')`. 
Users have no indication this limit exists or how to change it.
   
   The `task_acks_late` configuration description also incorrectly states it 
"effectively overrides the visibility timeout", which is not true for Redis/SQS 
brokers — the broker-level redelivery happens regardless of acknowledgment 
settings.
   
   ## Root Cause
   
   `get_default_celery_config()` in `default_celery.py` hardcodes 
`broker_transport_options["visibility_timeout"] = 86400` when no value is 
configured and the broker is Redis or SQS. This happens silently with no log 
output, so users don't know about the 24h limit until their tasks are killed. 
The `task_acks_late` documentation in `get_provider_info.py` compounds the 
confusion by claiming it overrides visibility_timeout.
   
   ## Fix
   
   Three changes in the celery provider:
   
   1. **`default_celery.py`**: Added a `log.warning()` when the default 86400s 
visibility_timeout is applied, telling users about the limit and how to 
increase it via `[celery_broker_transport_options] visibility_timeout`.
   
   2. **`get_provider_info.py`** (`task_acks_late` description): Corrected the 
documentation to state that `task_acks_late` does NOT override 
`visibility_timeout` for Redis/SQS brokers, and that users must separately 
increase `visibility_timeout` for long-running tasks.
   
   3. **`get_provider_info.py`** (`visibility_timeout` description): Added note 
that Airflow defaults to 86400s when not configured, and that tasks exceeding 
this value will be terminated.
   
   Added 3 unit tests:
   - `test_visibility_timeout_default_warns_when_not_configured` — verifies 
warning is logged with Redis broker
   - `test_visibility_timeout_no_warning_when_configured` — verifies no warning 
when explicitly set
   - `test_visibility_timeout_not_set_for_unsupported_broker` — verifies no 
warning/default for RabbitMQ
   
   Tests follow existing patterns in `test_celery_executor.py` (validated via 
CI).
   
   Closes: #62218
   
   ##### Was generative AI tooling used to co-author this PR?
   - [X] Yes — Claude Code
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to