GagandeepS commented on issue #19548: URL: https://github.com/apache/airflow/issues/19548#issuecomment-968409201
So we have a use case where multiple dynamic dags are getting added to the dagbag and I believe there will always be a latency between dropping a new dag into the dagbag folder and operator checking if the path/record of that new dag exists in the table or not using trigger_dag. So, to let scheduler take as much time it needs to insert the record into the table, we trigger the new dag and check if it gets triggered without error or not. If there is an error (usually 'Dag xxx does not exists') then it retries again in some time. So far so good, except when there is a peak load (10s of DAGs are getting generated dynamically and getting saved in the DAG bag). In this case scheduler gets slow coz it needs to insert multiple record and hence trigger_dag (coz of retry) takes 3-10min. I want to minimize this 3-10min. Proposed solution: Potentially, either add a table in airflow backend data model or use an index or bulk insert or similar so that the performance of scheduler, while inserting the new record, does not gets hampered and searching of the new dag gets faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
