jason810496 opened a new pull request, #59753: URL: https://github.com/apache/airflow/pull/59753
related: https://github.com/apache/airflow/pull/49470, https://github.com/apache/airflow/pull/54813 ## Why After [Resolve OOM When Reading Large Logs in Webserver #49470](https://github.com/apache/airflow/pull/49470) and [Add stream method to RemoteIO #54813](https://github.com/apache/airflow/pull/54813), we now support memory efficient stream-based read interface (`RemoteIO.stream` method) when reading TaskInstance Logs, but we still need to implement the `stream` method for corresponding RemoteIO on provider side to make the whole reading path memory efficient. ## What - Add `stream` method on `GCSRemoteIO` to make TaskInstance Log reading path memory efficient - Refactor `read` method to call `stream` method instead of duplicating common logic ## Verification I tested the change across the following Airflow versions. - `3.2.0` ( `main` branch ) - call `GCSRemoteIO.stream` method - command: `breeze start-airflow --backend postgres --mount-sources providers-and-tests --use-airflow-version apache/airflow:main` - <img width="1452" height="786" alt="apache/airflow:main" src="https://github.com/user-attachments/assets/d4da52d3-dd11-4fa4-b84a-9c1d99acfcdb" /> - `3.1.5` - call `GCSRemoteIO.read` method - command: `breeze start-airflow --backend postgres --mount-sources providers-and-tests --use-airflow-version 3.1.5` - <img width="1467" height="549" alt="3.1.5" src="https://github.com/user-attachments/assets/b97496ab-f6cb-4d23-87fa-7371038752e3" /> - `2.11.0` - call `GCSRemoteIO.read` method - command: `breeze start-airflow --backend postgres --mount-sources providers-and-tests --use-airflow-version 2.11.0` - <img width="1460" height="528" alt="2.11.0" src="https://github.com/user-attachments/assets/7366c1ce-4fd1-4e59-a7b1-80e87daf960f" /> - Screenshot of Google Cloud Storage - <img width="380" height="307" alt="GCS Screenshot" src="https://github.com/user-attachments/assets/4f177d3d-5f06-4732-9a1f-712c8618fb5c" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
