I believe that is mostly because we want to skip parsing/loading .py files that doesn't contain DAG defs to save time, as scheduler is going to parse/load the .py files over and over again and some files can take quite long to load.
Cheers, Kevin Y On Fri, Nov 23, 2018 at 12:44 AM soma dhavala <soma.dhav...@gmail.com> wrote: > happy to report that the “fix” worked. thanks Alex. > > btw, wondering why was it there in the first place? how does it help — > saves time, early termination — what? > > > > On Nov 23, 2018, at 8:18 AM, Alex Guziel <alex.guz...@airbnb.com> wrote: > > > > Yup. > > > > On Thu, Nov 22, 2018 at 3:16 PM soma dhavala <soma.dhav...@gmail.com > <mailto:soma.dhav...@gmail.com>> wrote: > > > > > >> On Nov 23, 2018, at 3:28 AM, Alex Guziel <alex.guz...@airbnb.com > <mailto:alex.guz...@airbnb.com>> wrote: > >> > >> It’s because of this > >> > >> “When searching for DAGs, Airflow will only consider files where the > string “airflow” and “DAG” both appear in the contents of the .py file.” > >> > > > > Have not noticed it. From airflow/models.py, in process_file — (both in > 1.9 and 1.10) > > .. > > if not all([s in content for s in (b'DAG', b'airflow')]): > > .. > > is looking for those strings and if they are not found, it is returning > without loading the DAGs. > > > > > > So having “airflow” and “DAG” dummy strings placed somewhere will make > it work? > > > > > >> On Thu, Nov 22, 2018 at 2:27 AM soma dhavala <soma.dhav...@gmail.com > <mailto:soma.dhav...@gmail.com>> wrote: > >> > >> > >>> On Nov 22, 2018, at 3:37 PM, Alex Guziel <alex.guz...@airbnb.com > <mailto:alex.guz...@airbnb.com>> wrote: > >>> > >>> I think this is what is going on. The dags are picked by local > variables. I.E. if you do > >>> dag = Dag(...) > >>> dag = Dag(…) > >> > >> from my_module import create_dag > >> > >> for file in yaml_files: > >> dag = create_dag(file) > >> globals()[dag.dag_id] = dag > >> > >> You notice that create_dag is in a different module. If it is in the > same scope (file), it will be fine. > >> > >>> > >> > >>> Only the second dag will be picked up. > >>> > >>> On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala <soma.dhav...@gmail.com > <mailto:soma.dhav...@gmail.com>> wrote: > >>> Hey AirFlow Devs: > >>> In our organization, we build a Machine Learning WorkBench with > AirFlow as > >>> an orchestrator of the ML Work Flows, and have wrapped AirFlow python > >>> operators to customize the behaviour. These work flows are specified in > >>> YAML. > >>> > >>> We drop a DAG loader (written python) in the default location airflow > >>> expects the DAG files. This DAG loader reads the specified YAML files > and > >>> converts them into airflow DAG objects. Essentially, we are > >>> programmatically creating the DAG objects. In order to support muliple > >>> parsers (yaml, json etc), we separated the DAG creation from loading. > But > >>> when a DAG is created (in a separate module) and made available to the > DAG > >>> loaders, airflow does not pick it up. As an example, consider that I > >>> created a DAG picked it, and will simply unpickle the DAG and give it > to > >>> airflow. > >>> > >>> However, in current avatar of airfow, the very creation of DAG has to > >>> happen in the loader itself. As far I am concerned, airflow should not > care > >>> where and how the DAG object is created, so long as it is a valid DAG > >>> object. The workaround for us is to mix parser and loader in the same > file > >>> and drop it in the airflow default dags folder. During dag_bag > creation, > >>> this file is loaded up with import_modules utility and shows up in the > UI. > >>> While this is a solution, but it is not clean. > >>> > >>> What do DEVs think about a solution to this problem? Will saving the > DAG to > >>> the db and reading it from the db work? Or some core changes need to > happen > >>> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs. > >>> > >>> thanks, > >>> -soma > >> > > > >