Let me rephrase the proposal as I understand it (it is loaded with a few terms and code and PRs so I think everyone would benefit with a "lame" description of someone who tries to understand it.
* 1st of all - was the "zip format" a mistake? I understand that we are talking about switching to NDJSon only? (It can be .gzipped or whatever, but that is more of the transport layer? * 2ndly - do I understand that the proposal is to change this: ``` def get_log( ... accept: HeaderAcceptJsonOrNdjson, token: .. ``` into this (also making it a streaming API): ``` def get_log( ... accept: HeaderAcceptNdjson, > no token ``` And simply switch to reading the logs by the client line-by-line from the API - just streaming the answer effectively and yielding line-by-line? Is that the correct understanding? If that's a correct understanding (or maybe regardless) I have few questions: * what would be the proposed timeline for that change ? * what is the impact on current remote log handlers? Are all of them supporting direct streaming **today** or do we need to complete any implementation there? * will that change work with older versions of providers (for those handlers) or would we have to make Airflow (after it is implemented) only work with newer versions of those? * I guess retrieving logs via the api is something our users might use heavily and I understa that means incompatible API change, and necessity to make our users modify their APIs (in case they use Json format? Or even if they use ndjson?) Is there any way to provide any compatibility solution ? Maybe we could have some compatibility shim / extra package (?) that could provide the compatibility layer but would be removed out of the core? I think any of such change should be considered from two angles: * pain to maintainers * pain to users So my question is really - how much this change is on the scale of pain for users / maintainers. J. On Fri, Aug 15, 2025 at 6:20 PM Zhe-You(Jason) Liu <jason...@apache.org> wrote: > Hi all, > > I would like to propose a change to the get_log API [1]: removing the > log_pos metadata and discontinuing the JSON response format. > > Following the update “Update useLog to support application/x-ndjson #54445” > [2], the frontend will adopt the application/x-ndjson format, which is more > efficient for streaming logs. Additionally, with the upcoming fix “Support > streaming log to the end for get_log API #54552” [3] (currently still a > draft PR), the get_log API will support streaming logs to completion. This > enhancement makes the log_pos metadata and the continuation_token logic [4] > unnecessary. > > In the previous update, “Resolve OOM When Reading Large Logs in Webserver > #49470” [5], I introduced LogStreamAccumulator [6] to handle the log_pos > metadata by flushing the log stream to temporary files and reading them > back. However, with the new streaming support, we can now yield the log > stream directly to the end in a single API call, improving performance and > reducing complexity. > > Benchmark results [7] show that, even after the #49470 refactor, using the > JSON format with get_log can still cause OOM issues when reading large > logs. Instead of continuing to support the JSON format, I suggest > *replacing > the JSON format with a Zip format*. This would allow users to conveniently > download full logs while maintaining memory efficiency for the API server. > > I appreciate your consideration of this proposal and look forward to your > feedback. > > Thank you! > > Best regards, > Jason > > [1]: > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L75 > [2]: https://github.com/apache/airflow/pull/54445 > [3]: https://github.com/apache/airflow/pull/54552 > [4]: > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L166 > [5]: https://github.com/apache/airflow/pull/49470 > [6]: > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/log/log_stream_accumulator.py > [7]: https://github.com/apache/airflow/pull/49470#issuecomment-2908306229 >