[
https://issues.apache.org/jira/browse/AIRFLOW-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anonymous reassigned AIRFLOW-2058:
----------------------------------
Assignee: Yang Pan
> Scheduler uses MainThread for DAG file processing
> -------------------------------------------------
>
> Key: AIRFLOW-2058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2058
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG
> Affects Versions: 1.9.0
> Environment: Ubuntu, Airflow 1.9, Sequential executor
> Reporter: Yang Pan
> Assignee: Yang Pan
> Priority: Blocker
>
> By reading the [source code
> |https://github.com/apache/incubator-airflow/blob/61ff29e578d1121ab4606fe122fb4e2db8f075b9/airflow/utils/dag_processing.py#L538]
> it appears the scheduler will process each DAG file, either a .py or .zip,
> using a new process.
>
> If I understand correctly, in theory what should happen in terms of
> processing a .zip file is that the dedicated process will add the .zip file
> to the PYTHONPATH, and load the file's module and dependency. When the DAG
> read is done, the process gets destroyed. And since the PYTHONPATH is process
> scoped, it won't pollute other processes.
>
> However by printing out the threads and process id, it looks like Airflow
> scheduler can sometimes accidentally pick up the main process instead of
> creating a new one, and that's when collision happens.
>
> Here is snippet of the PYTHONPATH when advanced_dag_dependency-1.zip is being
> processed. As you can see when it's executed by MainThread, it contains other
> .zip files. When it's using dedicated thread, only required .zip is added.
>
> sys.path :['/root/airflow/dags/yang_subdag_2.zip',
> '/root/airflow/dags/yang_subdag_2.zip',
> '/root/airflow/dags/yang_subdag_1.zip',
> '/root/airflow/dags/yang_subdag_1.zip',
> '/root/airflow/dags/advanced_dag_dependency-2.zip',
> '/root/airflow/dags/advanced_dag_dependency-2.zip',
> '/root/airflow/dags/advanced_dag_dependency-1.zip',
> '/root/airflow/dags/advanced_dag_dependency-1.zip',
> '/root/airflow/dags/yang_subdag_1', '/usr/local/bin', '/usr/lib/python2.7',
> '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk',
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
> '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages',
> '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config',
> '/root/airflow/dags', '/root/airflow/plugins']
> Print from MyFirstOperator in Dag 1
> process id: 5059
> thread id: <_MainThread(*MainThread*, started 140339858560768)>
>
> sys.path :[u'/root/airflow/dags/advanced_dag_dependency-1.zip',
> '/usr/local/bin', '/usr/lib/python2.7',
> '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk',
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
> '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages',
> '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config',
> '/root/airflow/dags', '/root/airflow/plugins']
> Print from MyFirstOperator in Dag 1
> process id: 5076
> thread id: <_MainThread(*DagFileProcessor283*, started 140137838294784)>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)