What about a manifest file that names all the DAGs? Or a naming convention for the DAG files themselves?
Alternatively, there could be a single entry point (ie, index.py) from which all the DAGs are instantiated. There's probably some complexity in making that work with the multi-process scheduler model, but doesn't seem insurmountable. On Thu, May 10, 2018 at 10:31 AM, Arthur Wiedmer <arthur.wied...@gmail.com> wrote: > Hi Song, > > I agree that this is not ideal, but it is difficult to do otherwise without > parsing/executing the Python code. > > Note that an import from airflow should be enough, or DAG in a comment. I > think we are open to other solutions, if anyone on the list has better > ideas. > > > Best, > Arthur > > > > On Thu, May 10, 2018 at 12:59 AM Song Liu <song...@outlook.com> wrote: > > > Hi, > > > > I just create a custom Dag class naming such as "MyPipeline" by extending > > the "DAG" class, but Airflow is failed to identify this is a DAG file. > > > > After digging into the Airflow implementation around the > dag_processing.py > > file: > > > > ``` > > # Heuristic that guesses whether a Python file contains an # Airflow DAG > > definition. might_contain_dag = True if safe_mode and not > > zipfile.is_zipfile(file_path): with open(file_path, 'rb') as f: content = > > f.read() might_contain_dag = all( [s in content for s in (b'DAG', > > b'airflow')]) > > ``` > > > > So if the keyword "DAG" and "airflow" contained, it is a DAG file. > > > > I don't know is there any other be more scientific way for this ? > > > > Thanks, > > Song > > >