killua1zoldyck commented on issue #31105:
URL: https://github.com/apache/airflow/issues/31105#issuecomment-1605737022

   We can assign hard-limits and that would solve this current issue. We can 
just sum filesizes before loading any of them (or most of them, I do not know 
we can get all of their sizes) into memory. But, I believe even for file sizes 
smaller than that we need not load the whole file into memory and sort for 
every auto-tailing call. 
   
   > * instead of storing the logs in in-memory lists, stream them to temporary 
files and read them from there (and then indeed k-way merge would be better
   
   We can do this. Now, this will be in sorted order however if the task is 
still running new logs could have come in the time we sent our response. Even, 
for this we need to maintain a log position for different log streams and call 
the reading methods with appropriate metadata to update this temp file. And for 
some of the methods we still need to load the whole file into memory like the 
HDFS-one. For them we could filter out after loading into memory. I believe we 
can do this as it could reduce memory-usage and network congestion and we are 
sending metadata back-and-forth anyways might as well send the log positions of 
a few more files. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to