potiuk commented on a change in pull request #21731:
URL: https://github.com/apache/airflow/pull/21731#discussion_r820709494



##########
File path: airflow/dag_processing/manager.py
##########
@@ -591,6 +593,26 @@ def _run_parsing_loop(self):
                 else:
                     poll_time = 0.0
 
+    @provide_session
+    def _fetch_callbacks(self, max_callbacks: int, session: Session = 
NEW_SESSION):
+        """Fetches callbacks from database and add them to the internal pipe 
for execution."""
+        if not conf.getboolean("scheduler", "standalone_dag_processor"):
+            # Nothing to do if callbacks are not stored in the database
+            return
+        self.log.debug("Fetching callbacks from the database.")
+        callbacks = (
+            session.query(DbCallbackRequest)

Review comment:
       I see we gave up on idea of using Row locks here. Aren't we afraid that 
we will start reading and processing  the same callbacks multipe times in 
multiple parallel DAG processor processes? 
   
   I do not see we block the table here so I think when this query is run you 
can simply retrieve the same callbacks by multiple processors. Also the query 
is not a "quick" one - because it has "conn.send()" in the loop - which is 
potentially blocking operation (like all socket operations - especially if the 
receiving side will not be "fast enough". In such case it's virtually 
guaranteed that the first callback in the DB will be picked up by multiple 
processes or dag processor(s).
   
   Maybe I miss something. But I think "for update" and "skip locked" are 
necessary here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to