shenoykarthikd opened a new issue #11266: URL: https://github.com/apache/airflow/issues/11266
FAQ Documentation for max_threads currently reads as follows: max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value (e.g numbers of cpus where scheduler runs - 1) in production. The example above creates confusion in the minds of new developers as it is incorrectly understood as the maximum number of threads for the scheduler cannot exceed the number of cpus - 1. I have seen many Airflow installations where the value is setup as max number of cpus - 1, while the upper limit of threads should actually be determined by the size of the instance (CPU + Memory) onto which the scheduler is installed. Due to this misunderstanding, I've heard many new Airflow developers say that Airflow is very slow at scheduling DAGs. When I delve deeper into their config I see the max_threads configuration limited to the number of CPUs. Kindly consider changing this to the below as follows - max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value that fits the size of the installed hardware in production. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
