BasPH edited a comment on issue #4396: [AIRFLOW-3585] - Add edges to database
URL: https://github.com/apache/airflow/pull/4396#issuecomment-466340099
 
 
   I think this PR is a great effort by @ffinfo to persist more of the DAG in 
the database. I hope some time in the future we can get more and more of a DAG 
in the database, to make it the single source of truth, with the ultimate goal 
of having a stateless webserver so we don't have to refresh for minutes after a 
change.
   
   Since the change touches various parts of the core of Airflow, I hope some 
clarification helps this PR:
   
   - This PR persists task dependencies in a new table `dag_edge`.
   - The term "graph" is introduced in the code, this contains the structure of 
a DAG, so the "edges" (dependencies) and "nodes" (tasks).
   - A DagRun is bound to one `graph_id`.
   - Currently in Airflow only the latest version of a DAG is displayed in the 
UI (both graph & tree view). This means if you delete a task, you cannot see 
runs of that task in the past anymore.
   - In the graph view you can now see different graph versions, because we 
store both tasks and edges.
   - For the record: in the tree view you still only the latest version because 
it is not possible to combine all history into a single view.
   
   Changes from a user perspective:
   - Nothing in the tree view.
   - In the graph view, you can now view different "graphs" if you change the 
structure of your DAG. Note the graph view shows DAG runs. If you change your 
DAG without running it, it does not show in the graph view.
   - When you have no DAG runs, there is no graph to show. So, as @ffinfo 
described above he then reads the graph from the DAG file instead. You can see 
this behaviour in the graph view url:
        - if DagRuns exist: http://host/graph?dag_id=my_dag
        - if no DagRuns exist: 
http://host/graph?dag_id=my_dag&read_from_file=True
   - In the screenshots in 
https://github.com/apache/airflow/pull/4396#issuecomment-465217731, you see 
this case. Since this is more of an internal thing how Airflow works, and not 
really informative for the user, @ffinfo removed the message in his last commit.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to