aoen commented on issue #4396: [AIRFLOW-3585] - Add edges to database URL: https://github.com/apache/airflow/pull/4396#issuecomment-467068171 Couple of high-level comments: 1. There is already a serialized representation of DAGs so DB serialization should probably go through that even though that might be a bit more work. 2. There is currently a DB call for each DAG in your code, these should be batched as much as possible (just need to be careful that query doesn't get too big). Along this vein probably need to stress test this with a large number of large DAGs and multiply results by appropriate factor if the DB used for testing is local instead of remote to see what performance would be like for real-world usage. In the long run I envision DAG serialization happening on new Airflow clients, which send a request to a new Airflow service which basically serves as a CRUD wrapper around a DB to store both the SimpleDag as well as some kind of reference to some encapsulation of all of the python dependencies for the DAG (e.g. docker image name). This way all 3 of the webserver/worker/scheduler could use the same data model and source of truth. I feel it might make sense to figure out the long term plan first via an AIP and some brainstorming sessions and make sure there is an easy path forward from any intermediate proposals so we don't make our lives harder later undoing changes/figuring out how to do migrations. Security is another thing to keep in mind when thinking about this problem too since this work would be required for multi-tenancy in Airflow. @KevinYang21 I know you have been looking at this problem recently as well so curious what you think especially about short term/long term solutions.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
