jscheffl commented on code in PR #42048:
URL: https://github.com/apache/airflow/pull/42048#discussion_r1773984567


##########
airflow/providers/edge/executors/edge_executor.py:
##########
@@ -151,6 +151,18 @@ def cleanup_stuck_queued_tasks(self, tis: 
list[TaskInstance]) -> list[str]:  # p
         """
         raise NotImplementedError()
 
+    def try_adopt_task_instances(self, tis: Sequence[TaskInstance]) -> 
Sequence[TaskInstance]:
+        """
+        Try to adopt running task instances that have been abandoned by a 
SchedulerJob dying.
+
+        Anything that is not adopted will be cleared by the scheduler (and 
then become eligible for
+        re-scheduling)
+
+        :return: any TaskInstances that were unable to be adopted
+        """
+        # We handle all running tasks from the DB in sync, no adoption logic 
needed.

Review Comment:
   For the MVP it is intended.
   
   I would still refer to the fact that this is MVP. Main intent is to have 
something running and then to optimize.
   I would not say the schedulers are stepping on each-others toe, they just 
share one DB table for all open jobs. Jobs are purged from the table a few 
minutes after completion. So it is all very simple short running transactions 
and all should be in cache of Postgres. The shared table contains N*PARALLELSIM 
open records for N schedulers.
   
   For AIP for MVP I documented that max 100 workers are the planned start. 
Later improvement:
   
![image](https://github.com/user-attachments/assets/7f6bf38d-940a-4bf0-bdd2-caec5d6e66ad)
   
   But as there might be doubt... just created a small DAG with a dynamic 
mapped PythonOperator, 1 print() statement. Mapped to 500 tasks. Executed on my 
laptop with webserver, 1 scheduler, 4 worker (each concurrency of 8) and the 
500 tasks were completed in 11:48min. Whereas almost all time was used by 
python processes forked off workers to start/init. Second biggest CPU consumer 
was webserver (~1 core) for the API backend. Scheduler as well as Postgres DB 
each consumed ~3-5% of one of the cores.
   Have not more hardware at my hands to distribute more load by workers but 
scheduler in this small test was not showing any performance challenge.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to