jason810496 commented on issue #44753:
URL: https://github.com/apache/airflow/issues/44753#issuecomment-2526209568

   Hi, after tracing down related GitHub issues and attempting to reproduce the 
OOM issue, I have some questions and ideas about related improvements.  
   
   ### Related Issues and Context
   
   - https://github.com/apache/airflow/pull/29390 
       - Limits UI rendering to 10,000 lines, but the webserver still reads the 
full logs.
       - The browser tab will get stuck also.
   - https://github.com/apache/airflow/issues/29405
       - https://github.com/apache/airflow/pull/30729
       - Only adds a description of `log_pos` and `end_of_log` metadata using 
`URLSafeSerializer`, the `get_log` API remain unchanged.
   - https://github.com/apache/airflow/issues/33625
       - Pagination on UI side, haven't been resolved yet. 
       - Suggests exposing `log_pos` and `end_of_log` as query parameters for 
better UI pagination.  
     As noted by @bbovenzi, this remains unresolved.
   
   ### Ideas for Related Improvements
   
   > I’d be happy to take on these improvements if there are no objections!
   
   1. **Expose `log_pos` and `end_of_log` as Query Parameters**:
      - Implement in Flask and backport to `v2.10.x`.
      - Implement in FastAPI. ( new UI will have to implement the pagination 
logic as well )
   
   2. **Pagination for Legacy UI**:
      - Implement pagination for logs in the legacy UI after addressing the 
above.
   
   3. **Refactor `FileTaskHandler`**:
      - Address the root cause of OOM directly at the handler level.
   
   Though I couldn't reliably reproduce OOM on the webserver, the browser tab 
crashing due to large logs remains an issue.
   
   ### Question
   
   How can I reliably reproduce the OOM issue when fetching large task logs?  
   I’ve tried observing memory usage with 
[Memray](https://bloomberg.github.io/memray/run.html) but found it challenging 
to reproduce.  
   - **Steps Taken**:
     - Used the DAG mentioned in [PR 
#29390](https://github.com/apache/airflow/pull/29390).
     - Triggered runs multiple times and managed to reproduce OOM only once.
     - Used `breeze start-airflow --python 3.9 --backend sqlite` as the 
environment.
   
   Any tips on how to consistently reproduce the issue? Or is there a specific 
environment setup recommended for this?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to