Thanks Daniel, Jarek,

Sorry, I should have described the background of this proposal more clearly
and also mentioned the back-compatibility plan.

Additionally, the title should be **"Remove `log_pos` metadata and
`continuation_token` logic for `get_log` API"**, which better reflects the
proposal.
The JSON format will remain supported for compatibility.

For Daniel's

> Since if your change comes out in 3.2, then providers that are not pinned
>= 3.2 would still need to support the old ways, and the core
filetaskhandler might need to continue to support the old way until 4.0.
Right?

This change will not break existing providers. The back-compatibility layer
for provider task log handlers was already completed in PR #49470 [1]. This
change will only affect the core FileTaskHandler.

Providers are already planned to be refactored to support the new streaming
read interface (returning a generator of `str` or `StructuredLogMessage`)
instead of explicitly reading logs into a large list.

For Jarek's

> * 1st of all - was the "zip format" a mistake? I understand that we are
talking about switching to NDJSon only? (It can be .gzipped or whatever,
but that is more of the transport layer?

Not a mistake, but a new feature. I should have separated the idea of "Zip
format" as an independent point. That is, supporting "download full logs in
Zip format" would be a new feature, unrelated to the streaming read API
change.

> * 2ndly - do I understand that the proposal is to change this:

Yes, this proposal is mainly focused on removing the `log_pos` logic, which
also implies removing the `continuation_token` logic.

> And simply switch to reading the logs by the client line-by-line from the
API - just streaming the answer effectively and yielding line-by-line?  Is
that the correct understanding?

Yes, exactly. The TaskInstanceLog frontend page will only need to call the
`get_log` API once, and then it can stream logs line-by-line until
completion after the #54552 fix.

Before the #54552 fix, the FileTaskHandler *could only read content already
flushed to the file and could not access content still being written.*
Therefore, a polling mechanism was needed to detect new file changes and
keep the API connection open for streaming content to the frontend. While
implementing this polling mechanism, I also realized that removing the
`log_pos` metadata is necessary so that the streaming API can return log
records immediately. Otherwise, the frontend would be blocked until the
entire log stream was finished.
( Since we need `LogStreamAccumulator` to count the `log_pos`, but it will
flush the log stream to temp file then read them back. )

> * what would be the proposed timeline for that change ?

Ideally released in Airflow 3.1, since removing the `log_pos` metadata and
`continuation_token` logic is a breaking change.

> * what is the impact on current remote log handlers?

No impact on current remote log handlers. Only the core FileTaskHandler and
`get_log` API will be affected.

> Are all of them supporting direct streaming **today** or do we need to
complete any implementation there?

Currently, only the CloudWatch log handler is in progress for supporting
the new streaming read interface [2]. Other remote log handlers are planned
to support the new interface, but are not yet implemented [3].

> * will that change work with older versions of providers (for those
handlers) or would we have to make Airflow (after it is implemented) only
work with newer versions of those?

The back-compatibility layer for provider task log handlers was already
completed in PR #49470 [1].

> * I guess retrieving logs via the api is something our users might use
heavily and I understa that means incompatible API change, and necessity to
make our users modify their APIs (in case they use Json format? Or even if
they use ndjson?)  Is there any way to provide any compatibility solution ?

The JSON format will remain supported for compatibility, but the `log_pos`
metadata and `continuation_token` logic will be removed.

> So my question is really - how much this change is on the scale of pain
for

Not much for maintainers, since the back-compatibility layer for remote log
handlers is already implemented. We also have a plan to support the new
streaming read interface across remote log handlers.

For users, this will be a breaking change for the `get_log` API in JSON
format due to the removal of `log_pos` metadata and `continuation_token`
logic. The `ndjson` format will remain unchanged, but with the fix to the
streaming issue, the API will now truly stream logs until the end and keep
the connection open until the log is fully read.

Thanks again for your feedback.
I hope this clarifies the proposal and the back-compatibility plan.

[1]:
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/log/file_task_handler.py#L613-L630
[2]: https://github.com/apache/airflow/pull/54054
[3]: https://github.com/apache/airflow/issues/45079 ( under the "TODO
Tasks" section )

On Sat, Aug 16, 2025 at 12:49 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Let me rephrase the proposal as I understand it (it is loaded with a few
> terms and code and PRs so I think everyone would benefit with a "lame"
> description of someone who tries to understand it.
>
> * 1st of all - was the "zip format" a mistake? I understand that we are
> talking about switching to NDJSon only? (It can be .gzipped or whatever,
> but that is more of the transport layer?
> * 2ndly - do I understand that the proposal is to change this:
> ```
> def get_log(
>      ...
>     accept: HeaderAcceptJsonOrNdjson,
>     token: ..
> ```
>
> into this (also making it a streaming API):
>
> ```
> def get_log(
>     ...
>     accept: HeaderAcceptNdjson,
>    > no token
> ```
>
> And simply switch to reading the logs by the client line-by-line from the
> API - just streaming the answer effectively and yielding line-by-line?  Is
> that the correct understanding?
>
> If that's a correct understanding (or maybe regardless) I have few
> questions:
>
> * what would be the proposed timeline for that change ?
> * what is the impact on current remote log handlers? Are all of them
> supporting direct streaming **today** or do we need to complete any
> implementation there?
> * will that change work with older versions of providers (for those
> handlers) or would we have to make Airflow (after it is implemented) only
> work with newer versions of those?
> * I guess retrieving logs via the api is something our users might use
> heavily and I understa that means incompatible API change, and necessity to
> make our users modify their APIs (in case they use Json format? Or even if
> they use ndjson?)  Is there any way to provide any compatibility solution ?
> Maybe we could have some compatibility shim / extra package (?) that could
> provide the compatibility layer but would be removed out of the core?
>
> I think any of such change should be considered from two angles:
>
> * pain to maintainers
> * pain to users
>
> So my question is really - how much this change is on the scale of pain for
> users / maintainers.
>
> J.
>
>
>
> On Fri, Aug 15, 2025 at 6:20 PM Zhe-You(Jason) Liu <jason...@apache.org>
> wrote:
>
> > Hi all,
> >
> > I would like to propose a change to the get_log API [1]: removing the
> > log_pos metadata and discontinuing the JSON response format.
> >
> > Following the update “Update useLog to support application/x-ndjson
> #54445”
> > [2], the frontend will adopt the application/x-ndjson format, which is
> more
> > efficient for streaming logs. Additionally, with the upcoming fix
> “Support
> > streaming log to the end for get_log API #54552” [3] (currently still a
> > draft PR), the get_log API will support streaming logs to completion.
> This
> > enhancement makes the log_pos metadata and the continuation_token logic
> [4]
> > unnecessary.
> >
> > In the previous update, “Resolve OOM When Reading Large Logs in Webserver
> > #49470” [5], I introduced LogStreamAccumulator [6] to handle the log_pos
> > metadata by flushing the log stream to temporary files and reading them
> > back. However, with the new streaming support, we can now yield the log
> > stream directly to the end in a single API call, improving performance
> and
> > reducing complexity.
> >
> > Benchmark results [7] show that, even after the #49470 refactor, using
> the
> > JSON format with get_log can still cause OOM issues when reading large
> > logs. Instead of continuing to support the JSON format, I suggest
> > *replacing
> > the JSON format with a Zip format*. This would allow users to
> conveniently
> > download full logs while maintaining memory efficiency for the API
> server.
> >
> > I appreciate your consideration of this proposal and look forward to your
> > feedback.
> >
> > Thank you!
> >
> > Best regards,
> > Jason
> >
> > [1]:
> >
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L75
> > [2]: https://github.com/apache/airflow/pull/54445
> > [3]: https://github.com/apache/airflow/pull/54552
> > [4]:
> >
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L166
> > [5]: https://github.com/apache/airflow/pull/49470
> > [6]:
> >
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/log/log_stream_accumulator.py
> > [7]:
> https://github.com/apache/airflow/pull/49470#issuecomment-2908306229
> >
>

Reply via email to