SameerMesiah97 commented on code in PR #62600:
URL: https://github.com/apache/airflow/pull/62600#discussion_r2937384408
##########
airflow-core/src/airflow/dag_processing/manager.py:
##########
@@ -971,6 +972,9 @@ def _create_process(self, dag_file: DagFileInfo) ->
DagFileProcessorProcess:
callback_to_execute_for_file = self._callback_to_execute.pop(dag_file,
[])
logger, logger_filehandle = self._get_logger_for_dag_file(dag_file)
+ # TODO: This is a blocking DB call, we should probably cache this or
move it elsewhere
+ known_pools = {p.pool for p in Pool.get_pools()}
Review Comment:
I understand that you mentioned the intention to eventually cache this, but
I don’t think it is acceptable to introduce a DB call here just to support
`known_pools`. This should not be waved away behind a `#TODO`.
It looks like this can be avoided by initializing `known_pools` alongside
the other DagFileProcessorManager attributes and refreshing it periodically
inside `_run_parsing_loop`. As it stands, this implementation introduces an
unnecessary metadata DB query every time `_create_process()` is called.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]