It all seems generally a good direction. You just have to be mindful of backcompat issues. Seems there would be gremlins there. Since if your change comes out in 3.2, then providers that are not pinned >= 3.2 would still need to support the old ways, and the core filetaskhandler might need to continue to support the old way until 4.0. Right? Maybe you could speak to the backcompat strategy for this.
On Fri, Aug 15, 2025 at 9:20 AM Zhe-You(Jason) Liu <jason...@apache.org> wrote: > Hi all, > > I would like to propose a change to the get_log API [1]: removing the > log_pos metadata and discontinuing the JSON response format. > > Following the update “Update useLog to support application/x-ndjson #54445” > [2], the frontend will adopt the application/x-ndjson format, which is more > efficient for streaming logs. Additionally, with the upcoming fix “Support > streaming log to the end for get_log API #54552” [3] (currently still a > draft PR), the get_log API will support streaming logs to completion. This > enhancement makes the log_pos metadata and the continuation_token logic [4] > unnecessary. > > In the previous update, “Resolve OOM When Reading Large Logs in Webserver > #49470” [5], I introduced LogStreamAccumulator [6] to handle the log_pos > metadata by flushing the log stream to temporary files and reading them > back. However, with the new streaming support, we can now yield the log > stream directly to the end in a single API call, improving performance and > reducing complexity. > > Benchmark results [7] show that, even after the #49470 refactor, using the > JSON format with get_log can still cause OOM issues when reading large > logs. Instead of continuing to support the JSON format, I suggest > *replacing > the JSON format with a Zip format*. This would allow users to conveniently > download full logs while maintaining memory efficiency for the API server. > > I appreciate your consideration of this proposal and look forward to your > feedback. > > Thank you! > > Best regards, > Jason > > [1]: > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L75 > [2]: https://github.com/apache/airflow/pull/54445 > [3]: https://github.com/apache/airflow/pull/54552 > [4]: > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L166 > [5]: https://github.com/apache/airflow/pull/49470 > [6]: > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/log/log_stream_accumulator.py > [7]: https://github.com/apache/airflow/pull/49470#issuecomment-2908306229 >