potiuk commented on issue #44753:
URL: https://github.com/apache/airflow/issues/44753#issuecomment-2525193224

   Mostly agree with @jscheffl -> but I still think merging logs **might** be 
useful in some cases, though the "naive" version it is done now should be 
either limited to certain log size that should be able to fit in memory or 
fixed to support arbitrary log size. Loading whole log to memory is generally 
bad idea (but OK if we can confirm they will fit in memory).
   
   There are some algorithms that could be used to do it "well" even when the 
logs are huge, but they requires much more sophisticated behaviour and likely 
are not suitable to run in an "API" call.  So I am not sure if it at all worth 
doing it (airflow is NOT a sophisticated logging solution) - but if we have 
this "download full log" (and even there we could download zipped logs from 
several sources without merging them) - that could be a useful counterpart for 
merging for big files.
   
   There are two options how to do it, I think:
   
   1) if the files are huge (and we could set arbitrary value here), only show 
the original task log (streaming it) and add a link to "download" the .zip file 
where you see the missing logs as well
   2) if the files are huge just download the "max" part of the files - perform 
merge and add a note that the logs are incomplete and that you should download 
the whole .zip content of the several logs
   
   Generally, yes, I think we should implement it (and prevent the OOM from 
happening).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to