ron-gaist opened a new issue, #56571:
URL: https://github.com/apache/airflow/issues/56571

   ### Apache Airflow version
   
   3.1.0
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Our large scale setup includes
   * ~1000 celery executor workers,
   * 15 api servers - 64 worker processes each (with enough resources - having 
checked utilization)
   
   Also, maybe relevantly,
   * 6 scheduler replicas
   * 2 dag processors
   * a pgbouncer with a large enough `airflow` connection pool size (doesn't 
reach maximum)
   * dags with up to 8k tasks (in parallel) and a final task that depends on 
all of them.
      usually dags are smaller than that, average ~5k tasks
   
   When all workers are active and working on task instances - they all get the 
following warning 4 times
   **[warning] Starting call to 'airflow.sdk.api.client.Client.request', this 
is the %d time calling it. [airflow.sdk.api.client]**
   and, the 5th time - they get this error:
   **[error] Task execute_workload[$celery_task_uuid] raise unexpected: 
ReadTimeout('timed out') [celery.app.trace]**
   
   We investigate this error a little and we found that the error comes from 
httpx default timeout
   from httpx docs ('https://www.python-httpx.org/advanced/timeouts/'):
   ```
   HTTPX is careful to enforce timeouts everywhere by default.
   The default behavior is to raise a TimeoutException after 5 seconds of 
network inactivity.
   ```
   
   ### What you think should happen instead?
   
   Airflow should allow users to configure the timeout via `airflow.cfg` to 
accommodate users with high-load systems.
   For example:
   ```
   [api]
   HTTPX_TIMEOUT = # 5 by default
   ```
   Also - maybe add a section to the docs detailing best practices when working 
with very high loads to make the api server reliable.
   
   
   ### How to reproduce
   
   (1) Run airflow in a kubernetes cluster with:
   ~ 1k celery workers
   ~ 15 api server replicas (64 worker processes. resource limits: 25Gi RAM. 8 
CPU cores)
   
   (2) Have large dags so that all 1k workers do tasks in parallel (each task 
should take more than 5 mins)
   
   (3) Observe workers for errors (ReadTimeout)
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-celery==3.12.2
   apache-airflow-providers-common-compat==1.7.3
   apache-airflow-providers-common-io==1.6.2
   apache-airflow-providers-common-sql=1.27.5
   apache-airflow-providers-standard==1.6.0
   apache-airflow-providers-postgres==6.2.3
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   Problem occurs everytime that all workers are executing a task instance (the 
highest load)
   logs:
   ```
   [warning] Starting call to 'airflow.sdk.api.client.Client.request', this is 
the 1st time calling it. [airflow.sdk.api.client]
   [warning] Starting call to 'airflow.sdk.api.client.Client.request', this is 
the 2nd time calling it. [airflow.sdk.api.client]
   [warning] Starting call to 'airflow.sdk.api.client.Client.request', this is 
the 3rd time calling it. [airflow.sdk.api.client]
   [warning] Starting call to 'airflow.sdk.api.client.Client.request', this is 
the 4th time calling it. [airflow.sdk.api.client]
   [error] Task execute_workload[a7469ad-3481-4fd4-b8f236b37cf1] raise 
unexpected: ReadTimeout('timed out') [celery.app.trace]
   ```
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to