答复: About the DAG discovering not synced between scheduler and webserver

Song Liu Sat, 12 May 2018 23:10:10 -0700

For example bellow two API from webserver, it is getting the dag out of one 
global dagbag object, which is instantiated when the app instance is created, 
so it (app/webserver) can't be aware of any new DAGs until this app is 
re-launched again ? What does this design for ?



```

dagbag = models.DagBag(settings.DAGS_FOLDER)



    @expose('/run')
    def run(self):
        dag_id = request.args.get('dag_id')
        dag = dagbag.get_dag(dag_id)


    @expose('/trigger')
    def trigger(self):
        dag_id = request.args.get('dag_id')
        dag = dagbag.get_dag(dag_id)
```



________________________________
发件人: 刘松(Brain++组) <[email protected]>
发送时间: 2018年5月13日 5:39:40
收件人: [email protected]
主题: Re: About the DAG discovering not synced between scheduler and webserver

Hi,


It seems that Airflow handles bellow situation currently:


-  DAGs discovered in scheduler, but not discovered by webserver yet

-  DAGs discovered in webserver, but not discovered by scheduler yet


I still don't quite understand why there is the discovering logic separately in 
scheduler and webserver, based on my understanding webserver only needs to 
display the orm_dags from metadb, is there any requirement or design 
consideration besides this ?


Many thanks for any information.


Thanks,

Song

________________________________
From: Song Liu <[email protected]>
Sent: Saturday, May 12, 2018 7:58:43 PM
To: [email protected]
Subject: About the DAG discovering not synced between scheduler and webserver

Hi,

When add a new dag, sometimes we can see:

```
This DAG isn't available in the web server's DagBag object. It shows up in this 
list because the scheduler marked it as active in the metadata database.
```

In the views.py, it will collect DAGs under "DAGS_FOLDER" by instantiate a 
DagBag object as bellow:

```
dagbag = models.DagBag(settings.DAGS_FOLDER)
```

So that webserver will depends on its own timing to collect DAGs, but why not 
just simply to query metadata db ? since if a DAG is active in DB now it can be 
visible in web at the time.

Could someone share something behind this design ?

Thanks,
Song

答复: About the DAG discovering not synced between scheduler and webserver

Reply via email to