ephraimbuddy commented on code in PR #49097:
URL: https://github.com/apache/airflow/pull/49097#discussion_r2041591540
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -867,7 +921,14 @@ def process_executor_events(
# Get task from the Serialized DAG
try:
- dag = dag_bag.get_dag(ti.dag_id)
+ dag = scheduler_dag_bag.get_dag(dag_run=ti.dag_run,
session=session)
+ cls.logger().error(
+ "DAG '%s' for task instance %s not found in
serialized_dag table",
+ ti.dag_id,
+ ti,
+ )
Review Comment:
Since we no longer delete serdag, I think any task instance from workers
will have the serdag? The exception handling is not needed anymore. Next PR
though. It will help to identify issues.
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -102,6 +102,55 @@
""":meta private:"""
+class SchedulerDagBag:
+ """
+ Internal class for retrieving and caching dags in the scheduler.
+
+ :meta private:
+ """
+
+ def __init__(self):
+ self._dags: dict[str, DAG] = {} # dag_version_id to dag
+
+ def _get_dag(self, version_id: str, session: Session) -> DAG | None:
+ if dag := self._dags.get(version_id):
+ return dag
+ dag_version = session.get(DagVersion, version_id)
+ if not dag_version:
+ return None
+ serdag = dag_version.serialized_dag
Review Comment:
Should we add lazy="joined" in dagversion to serdag relationship so that the
serialized_dag is always loaded?
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -199,7 +248,7 @@ def __init__(
if log:
self._log = log
- self.dagbag = DagBag(read_dags_from_db=True, load_op_links=False)
+ self.scheduler_dag_bag = SchedulerDagBag()
Review Comment:
Should we load up all serdag on initialization here? More like what DagBag
does
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]