potiuk commented on code in PR #40894:
URL: https://github.com/apache/airflow/pull/40894#discussion_r1685004350
##########
airflow/serialization/pydantic/taskinstance.py:
##########
@@ -458,9 +458,9 @@ def schedule_downstream_tasks(self, session: Session | None
= None, max_tis_per_
:meta: private
"""
- return TaskInstance._schedule_downstream_tasks(
- ti=self, session=session, max_tis_per_query=max_tis_per_query
- )
+ # we should not schedule downstream tasks with Pydantic model because
it will not be able to
+ # get the DAG object (we do not serialize it currently).
+ return
Review Comment:
Yeah. I even attempted that, but the thing is that it is rather useless -
effectively this one is run in a remote component (internal API) that de snot
have the DAG object and the only way to get that DAG object is to parse it.
The whole idea of mini scheduler is that we already have DAG object so that we
can get the downstream deps - and it would also mean that we have to parse the
DAG in the internal_api component - or serialize and parse the DAG object from
the worker. I'd opt for the latter - because if we start parsing the DAG in the
internal-api component, this has the potential of the DAG parsing accessing the
DB directly (internal_api object has access to the DB). Not mentioning that
most of the benefits of the mini-scheduler (DAG already loaded in memory) are
way less
So the easiest way is to just skip mini-scheduler for now. But we can bring
it back later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]