jason810496 commented on issue #44753: URL: https://github.com/apache/airflow/issues/44753#issuecomment-2526209568
Hi, after tracing down related GitHub issues and attempting to reproduce the OOM issue, I have some questions and ideas about related improvements. ### Related Issues and Context - https://github.com/apache/airflow/pull/29390 - Limits UI rendering to 10,000 lines, but the webserver still reads the full logs. - The browser tab will get stuck also. - https://github.com/apache/airflow/issues/29405 - https://github.com/apache/airflow/pull/30729 - Only adds a description of `log_pos` and `end_of_log` metadata using `URLSafeSerializer`, the `get_log` API remain unchanged. - https://github.com/apache/airflow/issues/33625 - Pagination on UI side, haven't been resolved yet. - Suggests exposing `log_pos` and `end_of_log` as query parameters for better UI pagination. As noted by @bbovenzi, this remains unresolved. ### Ideas for Related Improvements > I’d be happy to take on these improvements if there are no objections! 1. **Expose `log_pos` and `end_of_log` as Query Parameters**: - Implement in Flask and backport to `v2.10.x`. - Implement in FastAPI. ( new UI will have to implement the pagination logic as well ) 2. **Pagination for Legacy UI**: - Implement pagination for logs in the legacy UI after addressing the above. 3. **Refactor `FileTaskHandler`**: - Address the root cause of OOM directly at the handler level. Though I couldn't reliably reproduce OOM on the webserver, the browser tab crashing due to large logs remains an issue. ### Question How can I reliably reproduce the OOM issue when fetching large task logs? I’ve tried observing memory usage with [Memray](https://bloomberg.github.io/memray/run.html) but found it challenging to reproduce. - **Steps Taken**: - Used the DAG mentioned in [PR #29390](https://github.com/apache/airflow/pull/29390). - Triggered runs multiple times and managed to reproduce OOM only once. - Used `breeze start-airflow --python 3.9 --backend sqlite` as the environment. Any tips on how to consistently reproduce the issue? Or is there a specific environment setup recommended for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
