killua1zoldyck commented on issue #31105: URL: https://github.com/apache/airflow/issues/31105#issuecomment-1605737022
We can assign hard-limits and that would solve this current issue. We can just sum filesizes before loading any of them (or most of them, I do not know we can get all of their sizes) into memory. But, I believe even for file sizes smaller than that we need not load the whole file into memory and sort for every auto-tailing call. > * instead of storing the logs in in-memory lists, stream them to temporary files and read them from there (and then indeed k-way merge would be better We can do this. Now, this will be in sorted order however if the task is still running new logs could have come in the time we sent our response. Even, for this we need to maintain a log position for different log streams and call the reading methods with appropriate metadata to update this temp file. And for some of the methods we still need to load the whole file into memory like the HDFS-one. For them we could filter out after loading into memory. I believe we can do this as it could reduce memory-usage and network congestion and we are sending metadata back-and-forth anyways might as well send the log positions of a few more files. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
