The answer to both questions is "because the DAGs are python files" and "because that's how it is now/we haven't written it yet".

Historically Airflow needed the actual python code in the DAGs to do anything (show them in the UI, schedule them or execute them), but with Airflow 2.0 and DAG serialization becoming mandatory the UI no longer needs the files, and the "main" scheduler doesn't either, but the DAG parsing process still requires DAGs on disk, and the actual task execution will always need DAG files.

The main reason execution needs DAG files is to support Python operators (calling python functions defined in your DAG) or custom operators, which could also be defined in disk.

We could extend Airflow to support "submitting" DAGs via an API with the condition that no python operator, and no custom operators are used. Or python operator could work so long as there is no closure or advanced scope etc. But then we have to start to worry about all the edge cases and the security of the API becomes _much_ more important.

In short, because it's complicated and has some nasty edge cases.

We'll likely get there eventually.

-ash



On Thu, May 6 2021 at 16:22:28 +0800, 落雨留音 <[email protected]> wrote:
1. Why does airflow dag not support reading directly from db, but reading from a local file The current way of discovering dag is scan local files and then synchronize to db. If I want to create a dag, I need to create a dag file in the scheduler dags_folder, and then synchronize the dag file to the web and worker. Why can't I store dag file to the db directly? then web, scheduler, and worker all obtain the dag file through the db?

2. Why is there no createDag api
Why is there no api to create dag? as long as I call the api, dag information can be synchronized to db and the local files of web, scheduler and worker?

Reply via email to