dabla commented on PR #56457:
URL: https://github.com/apache/airflow/pull/56457#issuecomment-3727510532

   > > > > I have not had much luck scale testing this however, having 100 
concurrent tasks running seems to overload my laptop (behavior is consistent 
without this PR aswell)
   > > > 
   > > > 
   > > > Haha, yeah, running ~125 tasks took ~12GB RAM on top of the rest on my 
laptop. (each task just a python sleep()) - But well at least factor 10 more 
than the previous implementation.
   > > 
   > > 
   > > How come the sudden increase? That’s huge
   > 
   > The worker forks a Python process per supervisor and the supervisor forks 
another Python process to separate the workload. 125 running tasks mean 1+125*2 
== 251 processes. 125 sleeping, 125+1 sending heartbeats to API.
   > 
   > Subtracting the worker I think this is very lean actually, 12GB/125 
workload == ~98MB per workload == <50MB per Python interpreter (whereas 
benefitting from COW as process manager shows ~250MB/process in the task 
mananger)
   > 
   > I assume some memory can be save if we also implement the gc-freeze like 
in #58365 - Did not apply in this PR but would be the same here.
   > 
   > A major saving would be if the supervisor would be adding async 
capabilities such that the supervising runs all in one process for multiple 
tasks in async loops - but this would be less crash resistent and a major 
rework of the supervisor. Maybe also in the future...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to