jason810496 opened a new pull request, #49470:
URL: https://github.com/apache/airflow/pull/49470

   related issue: #45079 
   related PR: #45129
   related discussion on slack: 
https://apache-airflow.slack.com/archives/CCZRF2U5A/p1736767159693839
   
   ## Why
   
   In short, this PR aims to eliminate OOM issues by:  
   - Replacing full log sorting with a **K-Way Merge**  
   - Making the entire log reading path **streamable** (using `yield` 
generators instead of returning a list of strings)  
   
   More detailed reasoning is already described in the linked issue.
   
   Due to too many conflicts with the old PR (#45129), this PR reworks the 
changes on top of the latest `FileTaskHandler`.
   
   ## What
   
   This PR ports the original changes from #45129 to the current version of 
`FileTaskHandler` with the following updates:
   
   - Fixed line-splitting error when reading in chunks using buffered 
line-splitting with a carry-over  
   - Adopted the new log metadata structure  
   - Introduced buffering for the log reader  
   
   ## Note: Recent Changes in `FileTaskHandler`
   
   - Introduced `StructuredLogMessage` to represent each log record #46827
   - Added `RemoteLogIO` interface for remote log handling #48491


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to