dabla commented on PR #56457: URL: https://github.com/apache/airflow/pull/56457#issuecomment-3727510532
> > > > I have not had much luck scale testing this however, having 100 concurrent tasks running seems to overload my laptop (behavior is consistent without this PR aswell) > > > > > > > > > Haha, yeah, running ~125 tasks took ~12GB RAM on top of the rest on my laptop. (each task just a python sleep()) - But well at least factor 10 more than the previous implementation. > > > > > > How come the sudden increase? That’s huge > > The worker forks a Python process per supervisor and the supervisor forks another Python process to separate the workload. 125 running tasks mean 1+125*2 == 251 processes. 125 sleeping, 125+1 sending heartbeats to API. > > Subtracting the worker I think this is very lean actually, 12GB/125 workload == ~98MB per workload == <50MB per Python interpreter (whereas benefitting from COW as process manager shows ~250MB/process in the task mananger) > > I assume some memory can be save if we also implement the gc-freeze like in #58365 - Did not apply in this PR but would be the same here. > > A major saving would be if the supervisor would be adding async capabilities such that the supervising runs all in one process for multiple tasks in async loops - but this would be less crash resistent and a major rework of the supervisor. Maybe also in the future... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
