I am fine with the explanation - looks good,

On Sat, Aug 16, 2025 at 4:59 AM Zhe-You Liu <zhu424....@gmail.com> wrote:

> Thanks Daniel, Jarek,
>
> Sorry, I should have described the background of this proposal more clearly
> and also mentioned the back-compatibility plan.
>
> Additionally, the title should be **"Remove `log_pos` metadata and
> `continuation_token` logic for `get_log` API"**, which better reflects the
> proposal.
> The JSON format will remain supported for compatibility.
>
> For Daniel's
>
> > Since if your change comes out in 3.2, then providers that are not pinned
> >= 3.2 would still need to support the old ways, and the core
> filetaskhandler might need to continue to support the old way until 4.0.
> Right?
>
> This change will not break existing providers. The back-compatibility layer
> for provider task log handlers was already completed in PR #49470 [1]. This
> change will only affect the core FileTaskHandler.
>
> Providers are already planned to be refactored to support the new streaming
> read interface (returning a generator of `str` or `StructuredLogMessage`)
> instead of explicitly reading logs into a large list.
>
> For Jarek's
>
> > * 1st of all - was the "zip format" a mistake? I understand that we are
> talking about switching to NDJSon only? (It can be .gzipped or whatever,
> but that is more of the transport layer?
>
> Not a mistake, but a new feature. I should have separated the idea of "Zip
> format" as an independent point. That is, supporting "download full logs in
> Zip format" would be a new feature, unrelated to the streaming read API
> change.
>
> > * 2ndly - do I understand that the proposal is to change this:
>
> Yes, this proposal is mainly focused on removing the `log_pos` logic, which
> also implies removing the `continuation_token` logic.
>
> > And simply switch to reading the logs by the client line-by-line from the
> API - just streaming the answer effectively and yielding line-by-line?  Is
> that the correct understanding?
>
> Yes, exactly. The TaskInstanceLog frontend page will only need to call the
> `get_log` API once, and then it can stream logs line-by-line until
> completion after the #54552 fix.
>
> Before the #54552 fix, the FileTaskHandler *could only read content already
> flushed to the file and could not access content still being written.*
> Therefore, a polling mechanism was needed to detect new file changes and
> keep the API connection open for streaming content to the frontend. While
> implementing this polling mechanism, I also realized that removing the
> `log_pos` metadata is necessary so that the streaming API can return log
> records immediately. Otherwise, the frontend would be blocked until the
> entire log stream was finished.
> ( Since we need `LogStreamAccumulator` to count the `log_pos`, but it will
> flush the log stream to temp file then read them back. )
>
> > * what would be the proposed timeline for that change ?
>
> Ideally released in Airflow 3.1, since removing the `log_pos` metadata and
> `continuation_token` logic is a breaking change.
>
> > * what is the impact on current remote log handlers?
>
> No impact on current remote log handlers. Only the core FileTaskHandler and
> `get_log` API will be affected.
>
> > Are all of them supporting direct streaming **today** or do we need to
> complete any implementation there?
>
> Currently, only the CloudWatch log handler is in progress for supporting
> the new streaming read interface [2]. Other remote log handlers are planned
> to support the new interface, but are not yet implemented [3].
>
> > * will that change work with older versions of providers (for those
> handlers) or would we have to make Airflow (after it is implemented) only
> work with newer versions of those?
>
> The back-compatibility layer for provider task log handlers was already
> completed in PR #49470 [1].
>
> > * I guess retrieving logs via the api is something our users might use
> heavily and I understa that means incompatible API change, and necessity to
> make our users modify their APIs (in case they use Json format? Or even if
> they use ndjson?)  Is there any way to provide any compatibility solution ?
>
> The JSON format will remain supported for compatibility, but the `log_pos`
> metadata and `continuation_token` logic will be removed.
>
> > So my question is really - how much this change is on the scale of pain
> for
>
> Not much for maintainers, since the back-compatibility layer for remote log
> handlers is already implemented. We also have a plan to support the new
> streaming read interface across remote log handlers.
>
> For users, this will be a breaking change for the `get_log` API in JSON
> format due to the removal of `log_pos` metadata and `continuation_token`
> logic. The `ndjson` format will remain unchanged, but with the fix to the
> streaming issue, the API will now truly stream logs until the end and keep
> the connection open until the log is fully read.
>
> Thanks again for your feedback.
> I hope this clarifies the proposal and the back-compatibility plan.
>
> [1]:
>
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/log/file_task_handler.py#L613-L630
> [2]: https://github.com/apache/airflow/pull/54054
> [3]: https://github.com/apache/airflow/issues/45079 ( under the "TODO
> Tasks" section )
>
> On Sat, Aug 16, 2025 at 12:49 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Let me rephrase the proposal as I understand it (it is loaded with a few
> > terms and code and PRs so I think everyone would benefit with a "lame"
> > description of someone who tries to understand it.
> >
> > * 1st of all - was the "zip format" a mistake? I understand that we are
> > talking about switching to NDJSon only? (It can be .gzipped or whatever,
> > but that is more of the transport layer?
> > * 2ndly - do I understand that the proposal is to change this:
> > ```
> > def get_log(
> >      ...
> >     accept: HeaderAcceptJsonOrNdjson,
> >     token: ..
> > ```
> >
> > into this (also making it a streaming API):
> >
> > ```
> > def get_log(
> >     ...
> >     accept: HeaderAcceptNdjson,
> >    > no token
> > ```
> >
> > And simply switch to reading the logs by the client line-by-line from the
> > API - just streaming the answer effectively and yielding line-by-line?
> Is
> > that the correct understanding?
> >
> > If that's a correct understanding (or maybe regardless) I have few
> > questions:
> >
> > * what would be the proposed timeline for that change ?
> > * what is the impact on current remote log handlers? Are all of them
> > supporting direct streaming **today** or do we need to complete any
> > implementation there?
> > * will that change work with older versions of providers (for those
> > handlers) or would we have to make Airflow (after it is implemented) only
> > work with newer versions of those?
> > * I guess retrieving logs via the api is something our users might use
> > heavily and I understa that means incompatible API change, and necessity
> to
> > make our users modify their APIs (in case they use Json format? Or even
> if
> > they use ndjson?)  Is there any way to provide any compatibility
> solution ?
> > Maybe we could have some compatibility shim / extra package (?) that
> could
> > provide the compatibility layer but would be removed out of the core?
> >
> > I think any of such change should be considered from two angles:
> >
> > * pain to maintainers
> > * pain to users
> >
> > So my question is really - how much this change is on the scale of pain
> for
> > users / maintainers.
> >
> > J.
> >
> >
> >
> > On Fri, Aug 15, 2025 at 6:20 PM Zhe-You(Jason) Liu <jason...@apache.org>
> > wrote:
> >
> > > Hi all,
> > >
> > > I would like to propose a change to the get_log API [1]: removing the
> > > log_pos metadata and discontinuing the JSON response format.
> > >
> > > Following the update “Update useLog to support application/x-ndjson
> > #54445”
> > > [2], the frontend will adopt the application/x-ndjson format, which is
> > more
> > > efficient for streaming logs. Additionally, with the upcoming fix
> > “Support
> > > streaming log to the end for get_log API #54552” [3] (currently still a
> > > draft PR), the get_log API will support streaming logs to completion.
> > This
> > > enhancement makes the log_pos metadata and the continuation_token logic
> > [4]
> > > unnecessary.
> > >
> > > In the previous update, “Resolve OOM When Reading Large Logs in
> Webserver
> > > #49470” [5], I introduced LogStreamAccumulator [6] to handle the
> log_pos
> > > metadata by flushing the log stream to temporary files and reading them
> > > back. However, with the new streaming support, we can now yield the log
> > > stream directly to the end in a single API call, improving performance
> > and
> > > reducing complexity.
> > >
> > > Benchmark results [7] show that, even after the #49470 refactor, using
> > the
> > > JSON format with get_log can still cause OOM issues when reading large
> > > logs. Instead of continuing to support the JSON format, I suggest
> > > *replacing
> > > the JSON format with a Zip format*. This would allow users to
> > conveniently
> > > download full logs while maintaining memory efficiency for the API
> > server.
> > >
> > > I appreciate your consideration of this proposal and look forward to
> your
> > > feedback.
> > >
> > > Thank you!
> > >
> > > Best regards,
> > > Jason
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L75
> > > [2]: https://github.com/apache/airflow/pull/54445
> > > [3]: https://github.com/apache/airflow/pull/54552
> > > [4]:
> > >
> > >
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L166
> > > [5]: https://github.com/apache/airflow/pull/49470
> > > [6]:
> > >
> > >
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/log/log_stream_accumulator.py
> > > [7]:
> > https://github.com/apache/airflow/pull/49470#issuecomment-2908306229
> > >
> >
>

Reply via email to