mik-laj commented on issue #13941:
URL: https://github.com/apache/airflow/issues/13941#issuecomment-769636502


   The following description seems to me to be part of this documentation, but 
we should verify it.
   
   > We ensure isolation at the process level and each process opens a new 
connection so these components have many open connections. The new processes 
also allow us to circumvent GIL limitations, ie the problems with multi-thread 
handling in Python.
   >
   > - **Scheduler** processes files in a loop. For each file, we create a new 
process.  The number of files processed simultaneously is controlled by 
scheduler,max_threads (Airflow 1.10), scheduler.parsing_process (Airflow 2.0).  
We recommend setting this option to CPU Count-1.  Additionally, the main 
scheduler loop has an open connection as well. Managing the processing of files 
takes place in a separate process/loop, which creates another connection. This 
means we already have `[processing_process] +2` open connections at the same 
time.
   > - The main **webserver** process creates many gunicorn workers. The number 
of processes is controlled by the webserver.gunicorn options. In Airflow 1.10, 
each worker opened 2 connections to the database, but in Airflow 2.0, I fixed 
this and now each process opens only one connection. By default, we start 4 
workers.
   > - **Worker** processes handle multiple tasks, and for each task, three 
processes and 2 connections are created. The number of tasks per worker is 
configurable by the `core.parrallelism` options.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to