What's the advantage of this change? Performance? Alek
On Mon, Nov 27, 2017 at 1:11 PM, [email protected] < [email protected]> wrote: > Hi all, > > I wanted to gauge community interest in this idea we have. We are > currently running a modified version of Airflow 1.9 RC3 where we ignore > processing DAG definition Python files for paused DAGs. By default, > list_py_file_paths traverses the dags subdirectory to look for Python > files, and the scheduler processes all these files, regardless of whether > the DAGs defined in these files are paused or not. Our proposed > modification was to query the fileloc column in the dag table, filtering > on is_paused=1 and is_active=1 to get a list of file paths for paused DAGs. > Then, we can exclude these files from the known_file_paths, so that the > scheduler does not process these files. This feature can be set on and off > via a scheduler config variable. > > If anyone is interested, we already have the code written, so we'd be > happy to package up our changes and create a PR. > > Thanks! > -Andy >
