Re: [I] Document why f-strings are used for logging calls (tooling-trusted-releases)

via GitHub Mon, 24 Nov 2025 09:25:01 -0800


sbp commented on issue #339:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/339#issuecomment-3571921474

[CPython issue #90358](https://github.com/python/cpython/issues/90358),
which later became a [Python forum
thread](https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802),
discusses a security issue with f-strings. The poster points out that
`logger.info(f"{untrusted}")` does not incur a template injection attack, but
`logger.info(f"{untrusted}", some_dict)` does because the untrusted input may
contain `%(example)999999999s` which will add nearly a GB of padding if
`example` is a key in `some_dict`.

One mitigation here is to disallow parameters at the interface layer. We
can, and should, implement this in
[`log.py`](https://github.com/apache/tooling-trusted-releases/blob/main/atr/log.py),
which is a useful existing place to attach such constraints. This would enable
us to keep using f-string style, but I will suggest a different approach in
this comment.

Another problem raised in the linked discussion is that raw templates allow
simple grouping of logging statements, as in Sentry, which is lost with
f-strings. I don't think that this is a consideration for ATR, but it may be
for other internal ASF projects to which our code conventions may eventually
apply. I think, however, that if there is to be a grouping facility it should
be explicit and it should accept a symbol, probably an enum member, to identify
the group.

A consideration which has arisen multiple times in the present issue and in
the linked discussion is performance. The focus seems to be mostly on values
which are expensive to compute. Python will evaluate argument expressions
before sending them to the logger, and in order to defer the computation we
either need to add a conditional wrapper, as suggested by @dave2wave, or send a
function into the logging mechanism itself, i.e. for continuation passing
style. Both solutions have obvious disadvantages.

On the other hand there has been less focus on the performance of the logger
string formatting itself. The linked discussion contends that f-strings are
generally fastest. What has received less to no attention is that this would
depend on the speed of the `__str__` methods of the parameters, but I think in
general this should be negligible and we do not need to consider it for ATR or
in the ASF broadly.

The original issue as opened by @sebbASF only asks that we document our
decision, but our decision is contingent on our stance on all of the issues
summarised above. The linked discussion mentions adding delayed f-strings to
Python, or using [PEP 501](https://peps.python.org/pep-0501/), as a solution to
some of these issues. PEP 501 eventually inspired t-strings, which were added
to Python 3.14, released on 7 Oct 2025; we may already be able to migrate to
Python 3.14. The PEP which introduced t-strings, [PEP
750](https://peps.python.org/pep-0750/), has an [entire example section about
structured
logging](https://peps.python.org/pep-0750/#example-structured-logging).

Although t-strings do not inherently support delayed evaluation, they do
support a controlled form in that lambda expressions can be used as inline
values. You can write, for example, `t"this is an {(lambda:
expensive_value())}"`, and then ensure that your t-string evaluator calls any
function with zero arguments in its parameter specification only when the
string is actually being logged. We can define these evaluators as
`logging.Formatter` subclasses to achieve that, as demonstrated in the example
logging section in PEP 750.

Using t-strings would also enable a pattern that was suggested by @potiuk
not here but [in the linked
discussion](https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802/20):

> Having f-strings, the only way we can actually do that is to get the
“content” of secrets we want to mask and search the whole logging message to
see if they are not in the logging message. This is SLOW and brittle. If we
could use only debugf/infof style of messages, finding out if you are using a
secret in one of your parameters, could be WAY faster

We could perform scanning of secrets for masking in the t-string evaluator.
Again, this would be efficient because it would only be called when evaluation
was strictly necessary for logging. It would not require any change of syntax
in the calling code, unlike the `(lambda: ...)` syntax required for deferred
evaluation of variables.

In short, t-strings carry the existing advantages of f-strings but with
mitigations or solutions for various f-string design problems. The main
drawback is that they require migration to Python 3.14. I have opened a
separate issue to track the investigation into the feasibility of this
migration, #346, but since we will migrate to Python 3.14 eventually whether or
not it is immediately feasible, I suggest that we consider further evaluation
of t-strings as a solution to the present issue. We can then link to this
thread in the documentation to resolve the original issue as stated.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Document why f-strings are used for logging calls (tooling-trusted-releases)

Reply via email to