stephen-bracken opened a new issue, #61499: URL: https://github.com/apache/airflow/issues/61499
### Description Most remote log handlers (e.g. `S3TaskHandler`, `ElasticsearchTaskHandler`) use the `FileTaskHandler` as a base logger, which reads a logfile and then serves it to the user. This causes problems for very large log files, as the webserver needs to serve the entire logfile back to the browser before any logs are shown. On S3 this can cause memory issues because the entire logfile needs to be loaded into memory on the workers before it can be written, and to the webserver's memory before it can be served to the user. This can lead to logs never being written if the worker's thread gets OOMKilled or the webserver crashing if too many users are reading large log files. On elasticsearch this can also be quite a bad experience because elasticsearch requires reading using paginated queries with a server side limit on the number of logs per page (default 10000 logs). ### Use case/motivation Please add a task log handler base for remote log stores that streams logs during writing, and when reading back to the user. ### Related issues related PR: #61492 ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
