I believe multiprocessing with use os.fork() on unix systems and thus we can 
take advantage of COW to reduce ram usage. However AFAIK on Windows child 
process will reimport all module level imports and thus may require some extra 
ram. But I don't think the extra imports will create a big burden on the ram 
usage( or maybe our scheduler box is just too big :P). I'll add an entry in 
UPDATING.md regarding this config line.

About the SQLA connection, you actually have the point. On Windows we might 
ended up configuring extra 16 connection pool while reimporting. And since 
subprocesses spun up by multiprocessing module do not run atexit() we might 
leave some hanging connections there in theory. However from my observation and 
test, SQLA initializes connections lazily and thus we at most have empty pool 
in the subprocesses.

FYI this is the test script/result I was playing with:
```
▶ cat test.py
import os
from multiprocessing import Pool

print('execute module code')

def test_func(num):
    print(num)

if __name__ == '__main__':
    pool = Pool(4)
    results = pool.map(test_func, [1,2,3,4], 1)
    pool.close()
    pool.join()
▶ python test.py
execute module code
1
2
3
4

-------------- mimic Windows behavior ----------------
▶ cat test.py
import os
from multiprocessing import Pool

print('execute module code')

def test_func(num):
    from airflow import settings
    print(num)
    print(settings.engine.pool.status())

if __name__ == '__main__':
    pool = Pool(4)
    results = pool.map(test_func, [1,2,3,4], 1)
    pool.close()
    pool.join()

▶ python test.py
execute module code
execute module code
execute module code
execute module code
execute module code
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting 
up DB connection pool (PID 80204)
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting 
up DB connection pool (PID 80202)
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting 
up DB connection pool (PID 80201)
airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting 
up DB connection pool (PID 80203)
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332] 
{{__init__.py:42}} DEBUG - Cannot import  due to  doesn't look like a module 
path
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332] 
{{__init__.py:42}} DEBUG - Cannot import  due to  doesn't look like a module 
path
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332] 
{{__init__.py:42}} DEBUG - Cannot import  due to  doesn't look like a module 
path
airflow.utils.log.logging_mixin.LoggingMixin [2018-09-07 17:37:24,332] 
{{__init__.py:42}} DEBUG - Cannot import  due to  doesn't look like a module 
path
4
1
2
3
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out 
connections: 0
```

[ Full content available at: 
https://github.com/apache/incubator-airflow/pull/3830 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to