aoen edited a comment on issue #4396: [AIRFLOW-3585] - Add edges to database
URL: https://github.com/apache/airflow/pull/4396#issuecomment-467068171
 
 
   I love the spirit of this change, but I'm -1 for now until we come up with 
some kind of long term plan.
   
   Couple of high-level comments after skimming the PR:
   1. There is already a serialized representation of DAGs so DB serialization 
should probably go through that even though that might be a bit more work.
   2. There is currently a DB call for each DAG in your code, these should be 
batched as much as possible (just need to be careful that query doesn't get too 
big). Along this vein probably need to stress test this with a large number of 
large DAGs and multiply results by appropriate factor if the DB used for 
testing is local instead of remote to see what performance would be like for 
real-world usage.
   
   In the long run I envision DAG serialization happening on new Airflow 
clients, which send a request to a new Airflow service which basically serves 
as a CRUD wrapper around a DB to store both the SimpleDag as well as some kind 
of reference to some encapsulation of all of the python dependencies for the 
DAG (e.g. docker image name). This way all 3 of the webserver/worker/scheduler 
could use the same data model and source of truth. I feel it might make sense 
to figure out the long term plan first via an AIP and some brainstorming 
sessions and make sure there is an easy path forward from any intermediate 
proposals so we don't make our lives harder later undoing changes/figuring out 
how to do migrations. Security is another thing to keep in mind when thinking 
about this problem too since this work would be required for multi-tenancy in 
Airflow.
   
   @KevinYang21 I know you have been looking at this problem recently as well 
so curious what you think especially about short term/long term solutions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to