The answer to both questions is "because the DAGs are python files" and
"because that's how it is now/we haven't written it yet".
Historically Airflow needed the actual python code in the DAGs to do
anything (show them in the UI, schedule them or execute them), but with
Airflow 2.0 and DAG serialization becoming mandatory the UI no longer
needs the files, and the "main" scheduler doesn't either, but the DAG
parsing process still requires DAGs on disk, and the actual task
execution will always need DAG files.
The main reason execution needs DAG files is to support Python
operators (calling python functions defined in your DAG) or custom
operators, which could also be defined in disk.
We could extend Airflow to support "submitting" DAGs via an API with
the condition that no python operator, and no custom operators are
used. Or python operator could work so long as there is no closure or
advanced scope etc. But then we have to start to worry about all the
edge cases and the security of the API becomes _much_ more important.
In short, because it's complicated and has some nasty edge cases.
We'll likely get there eventually.
-ash
On Thu, May 6 2021 at 16:22:28 +0800, 落雨留音
<[email protected]> wrote:
1. Why does airflow dag not support reading directly from db, but
reading from a local file
The current way of discovering dag is scan local files and then
synchronize to db. If I want to create a dag, I need to create a dag
file in the scheduler dags_folder, and then synchronize the dag file
to the web and worker. Why can't I store dag file to the db directly?
then web, scheduler, and worker all obtain the dag file through the
db?
2. Why is there no createDag api
Why is there no api to create dag? as long as I call the api, dag
information can be synchronized to db and the local files of web,
scheduler and worker?