stephen-bracken opened a new issue, #61499:
URL: https://github.com/apache/airflow/issues/61499

   ### Description
   
   Most remote log handlers (e.g. `S3TaskHandler`, `ElasticsearchTaskHandler`) 
use the `FileTaskHandler` as a base logger, which reads a logfile and then 
serves it to the user. This causes problems for very large log files, as the 
webserver needs to serve the entire logfile back to the browser before any logs 
are shown.
   On S3 this can cause memory issues because the entire logfile needs to be 
loaded into memory on the workers before it can be written, and to the 
webserver's memory before it can be served to the user. This can lead to logs 
never being written if the worker's thread gets OOMKilled or the webserver 
crashing if too many users are reading large log files.
   On elasticsearch this can also be quite a bad experience because 
elasticsearch requires reading using paginated queries with a server side limit 
on the number of logs per page (default 10000 logs).
   
   ### Use case/motivation
   
   Please add a task log handler base for remote log stores that streams logs 
during writing, and when reading back to the user.
   
   ### Related issues
   
   related PR: #61492
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to