ashb commented on a change in pull request #17819:
URL: https://github.com/apache/airflow/pull/17819#discussion_r703064312
##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -596,13 +597,24 @@ def _process_executor_events(self, session: Session =
None) -> int:
)
self.log.error(msg, ti, state, ti.state, info)
+ try:
+ get_dag(self.subdir, ti.dag_id)
Review comment:
It's not clear to me why we even need this? What's going on here?
##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -49,6 +49,7 @@
from airflow.ti_deps.dependencies_states import EXECUTION_STATES
from airflow.utils import timezone
from airflow.utils.callback_requests import DagCallbackRequest,
TaskCallbackRequest
+from airflow.utils.cli import get_dag
Review comment:
This function loads the actual DAG from the file on disk, which we can't
do in the scheduler
Can't because Airflow shouldn't ever load DAGs in to any long running
process which includes the scheduler. This is bad because it could mean a badly
written dag file could bring down the scheduler (and right now the scheduler is
isolated from this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]