potiuk commented on a change in pull request #21731:
URL: https://github.com/apache/airflow/pull/21731#discussion_r820709494
##########
File path: airflow/dag_processing/manager.py
##########
@@ -591,6 +593,26 @@ def _run_parsing_loop(self):
else:
poll_time = 0.0
+ @provide_session
+ def _fetch_callbacks(self, max_callbacks: int, session: Session =
NEW_SESSION):
+ """Fetches callbacks from database and add them to the internal pipe
for execution."""
+ if not conf.getboolean("scheduler", "standalone_dag_processor"):
+ # Nothing to do if callbacks are not stored in the database
+ return
+ self.log.debug("Fetching callbacks from the database.")
+ callbacks = (
+ session.query(DbCallbackRequest)
Review comment:
I see we gave up on idea of using Row locks here. Aren't we afraid that
we will start reading and processing the same callbacks multipe times in
multiple parallel DAG processor processes?
I do not see we block the table here so I think when this query is run you
can simply retrieve the same callbacks by multiple processors. Also the query
is not a "quick" one - because it has "conn.send()" in the loop - which is
potentially blocking operation (like all socket operations - especially if the
receiving side will not be "fast enough". In such case it's virtually
guaranteed that the first callback in the DB will be picked up by multiple
processes or dag processor(s).
Maybe I miss something. But I think "for update" and "skip locked" are
necessary here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]