I believe multiprocessing with use os.fork() on unix systems and thus we can
take advantage of COW to reduce ram usage. However AFAIK on Windows child
process will reimport all module level imports and thus may require some extra
ram. But I don't think the extra imports will create a big burden on the ram
usage( or maybe our scheduler box is just too big :P). I'll add an entry in
UPDATING.md regarding this config line.
About the SQLA connection, you actually have the point. On Windows we might
ended up configuring extra 16 connection pools while reimporting. And since
subprocesses spun up by multiprocessing module do not run atexit() we might
leave some hanging connections there in theory. However from my observation and
test, SQLA initializes connections lazily and thus we at most have empty pool
in the subprocesses.
I might be wrong about the Windows thing and SQLA lazy initialization thing,
open to discuss better handling if that is the case.
FYI this is the test script/result I was playing with:
```
▶ cat test.py
import os
from multiprocessing import Pool
print('execute module code')
def test_func(num):
print(num)
if __name__ == '__main__':
pool = Pool(4)
results = pool.map(test_func, [1,2,3,4], 1)
pool.close()
pool.join()
▶ python test.py
execute module code
1
2
3
4
-------------- mimic Windows behavior ----------------
▶ cat test.py
import os
from multiprocessing import Pool
print('execute module code')
def test_func(num):
from airflow import settings
print(num)
print(settings.engine.pool.status())
if __name__ == '__main__':
pool = Pool(4)
results = pool.map(test_func, [1,2,3,4], 1)
pool.close()
pool.join()
▶ python test.py
execute module code
execute module code
execute module code
execute module code
execute module code
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting
up DB connection pool (PID 80204)
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting
up DB connection pool (PID 80202)
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting
up DB connection pool (PID 80201)
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting
up DB connection pool (PID 80203)
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO -
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO -
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO -
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO -
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332]
{{__init__.py:42}} DEBUG - Cannot import due to doesn't look like a module
path
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332]
{{__init__.py:42}} DEBUG - Cannot import due to doesn't look like a module
path
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332]
{{__init__.py:42}} DEBUG - Cannot import due to doesn't look like a module
path
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332]
{{__init__.py:42}} DEBUG - Cannot import due to doesn't look like a module
path
4
1
2
3
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
Pool size: 5 Connections in pool: 0 Current Overflow: -5 Current Checked out
connections: 0
```
[ Full content available at:
https://github.com/apache/incubator-airflow/pull/3830 ]
This message was relayed via gitbox.apache.org for [email protected]