sbp commented on issue #339: URL: https://github.com/apache/tooling-trusted-releases/issues/339#issuecomment-3571921474
[CPython issue #90358](https://github.com/python/cpython/issues/90358), which later became a [Python forum thread](https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802), discusses a security issue with f-strings. The poster points out that `logger.info(f"{untrusted}")` does not incur a template injection attack, but `logger.info(f"{untrusted}", some_dict)` does because the untrusted input may contain `%(example)999999999s` which will add nearly a GB of padding if `example` is a key in `some_dict`. One mitigation here is to disallow parameters at the interface layer. We can, and should, implement this in [`log.py`](https://github.com/apache/tooling-trusted-releases/blob/main/atr/log.py), which is a useful existing place to attach such constraints. This would enable us to keep using f-string style, but I will suggest a different approach in this comment. Another problem raised in the linked discussion is that raw templates allow simple grouping of logging statements, as in Sentry, which is lost with f-strings. I don't think that this is a consideration for ATR, but it may be for other internal ASF projects to which our code conventions may eventually apply. I think, however, that if there is to be a grouping facility it should be explicit and it should accept a symbol, probably an enum member, to identify the group. A consideration which has arisen multiple times in the present issue and in the linked discussion is performance. The focus seems to be mostly on values which are expensive to compute. Python will evaluate argument expressions before sending them to the logger, and in order to defer the computation we either need to add a conditional wrapper, as suggested by @dave2wave, or send a function into the logging mechanism itself, i.e. for continuation passing style. Both solutions have obvious disadvantages. On the other hand there has been less focus on the performance of the logger string formatting itself. The linked discussion contends that f-strings are generally fastest. What has received less to no attention is that this would depend on the speed of the `__str__` methods of the parameters, but I think in general this should be negligible and we do not need to consider it for ATR or in the ASF broadly. The original issue as opened by @sebbASF only asks that we document our decision, but our decision is contingent on our stance on all of the issues summarised above. The linked discussion mentions adding delayed f-strings to Python, or using [PEP 501](https://peps.python.org/pep-0501/), as a solution to some of these issues. PEP 501 eventually inspired t-strings, which were added to Python 3.14, released on 7 Oct 2025; we may already be able to migrate to Python 3.14. The PEP which introduced t-strings, [PEP 750](https://peps.python.org/pep-0750/), has an [entire example section about structured logging](https://peps.python.org/pep-0750/#example-structured-logging). Although t-strings do not inherently support delayed evaluation, they do support a controlled form in that lambda expressions can be used as inline values. You can write, for example, `t"this is an {(lambda: expensive_value())}"`, and then ensure that your t-string evaluator calls any function with zero arguments in its parameter specification only when the string is actually being logged. We can define these evaluators as `logging.Formatter` subclasses to achieve that, as demonstrated in the example logging section in PEP 750. Using t-strings would also enable a pattern that was suggested by @potiuk not here but [in the linked discussion](https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802/20): > Having f-strings, the only way we can actually do that is to get the “content” of secrets we want to mask and search the whole logging message to see if they are not in the logging message. This is SLOW and brittle. If we could use only debugf/infof style of messages, finding out if you are using a secret in one of your parameters, could be WAY faster We could perform scanning of secrets for masking in the t-string evaluator. Again, this would be efficient because it would only be called when evaluation was strictly necessary for logging. It would not require any change of syntax in the calling code, unlike the `(lambda: ...)` syntax required for deferred evaluation of variables. In short, t-strings carry the existing advantages of f-strings but with mitigations or solutions for various f-string design problems. The main drawback is that they require migration to Python 3.14. I have opened a separate issue to track the investigation into the feasibility of this migration, #346, but since we will migrate to Python 3.14 eventually whether or not it is immediately feasible, I suggest that we consider further evaluation of t-strings as a solution to the present issue. We can then link to this thread in the documentation to resolve the original issue as stated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
