jscheffl commented on code in PR #42048:
URL: https://github.com/apache/airflow/pull/42048#discussion_r1773984567
##########
airflow/providers/edge/executors/edge_executor.py:
##########
@@ -151,6 +151,18 @@ def cleanup_stuck_queued_tasks(self, tis:
list[TaskInstance]) -> list[str]: # p
"""
raise NotImplementedError()
+ def try_adopt_task_instances(self, tis: Sequence[TaskInstance]) ->
Sequence[TaskInstance]:
+ """
+ Try to adopt running task instances that have been abandoned by a
SchedulerJob dying.
+
+ Anything that is not adopted will be cleared by the scheduler (and
then become eligible for
+ re-scheduling)
+
+ :return: any TaskInstances that were unable to be adopted
+ """
+ # We handle all running tasks from the DB in sync, no adoption logic
needed.
Review Comment:
For the MVP it is intended.
I would still refer to the fact that this is MVP. Main intent is to have
something running and then to optimize.
I would not say the schedulers are stepping on each-others toe, they just
share one DB table for all open jobs. Jobs are purged from the table a few
minutes after completion. So it is all very simple short running transactions
and all should be in cache of Postgres. The shared table contains N*PARALLELSIM
open records for N schedulers.
For AIP for MVP I documented that max 100 workers are the planned start.
Later improvement:

But as there might be doubt... just created a small DAG with a dynamic
mapped PythonOperator, 1 print() statement. Mapped to 500 tasks. Executed on my
laptop with webserver, 1 scheduler, 4 worker (each concurrency of 8) and the
500 tasks were completed in 11:48min. Whereas almost all time was used by
python processes forked off workers to start/init. Second biggest CPU consumer
was webserver (~1 core) for the API backend. Scheduler as well as Postgres DB
each consumed ~3-5% of one of the cores.
Have not more hardware at my hands to distribute more load by workers but
scheduler in this small test was not showing any performance challenge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]