potiuk commented on issue #44753: URL: https://github.com/apache/airflow/issues/44753#issuecomment-2525193224
Mostly agree with @jscheffl -> but I still think merging logs **might** be useful in some cases, though the "naive" version it is done now should be either limited to certain log size that should be able to fit in memory or fixed to support arbitrary log size. Loading whole log to memory is generally bad idea (but OK if we can confirm they will fit in memory). There are some algorithms that could be used to do it "well" even when the logs are huge, but they requires much more sophisticated behaviour and likely are not suitable to run in an "API" call. So I am not sure if it at all worth doing it (airflow is NOT a sophisticated logging solution) - but if we have this "download full log" (and even there we could download zipped logs from several sources without merging them) - that could be a useful counterpart for merging for big files. There are two options how to do it, I think: 1) if the files are huge (and we could set arbitrary value here), only show the original task log (streaming it) and add a link to "download" the .zip file where you see the missing logs as well 2) if the files are huge just download the "max" part of the files - perform merge and add a note that the logs are incomplete and that you should download the whole .zip content of the several logs Generally, yes, I think we should implement it (and prevent the OOM from happening). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
