sbp commented on issue #339:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/339#issuecomment-3571921474

   [CPython issue #90358](https://github.com/python/cpython/issues/90358), 
which later became a [Python forum 
thread](https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802),
 discusses a security issue with f-strings. The poster points out that 
`logger.info(f"{untrusted}")` does not incur a template injection attack, but 
`logger.info(f"{untrusted}", some_dict)` does because the untrusted input may 
contain `%(example)999999999s` which will add nearly a GB of padding if 
`example` is a key in `some_dict`.
   
   One mitigation here is to disallow parameters at the interface layer. We 
can, and should, implement this in 
[`log.py`](https://github.com/apache/tooling-trusted-releases/blob/main/atr/log.py),
 which is a useful existing place to attach such constraints. This would enable 
us to keep using f-string style, but I will suggest a different approach in 
this comment.
   
   Another problem raised in the linked discussion is that raw templates allow 
simple grouping of logging statements, as in Sentry, which is lost with 
f-strings. I don't think that this is a consideration for ATR, but it may be 
for other internal ASF projects to which our code conventions may eventually 
apply. I think, however, that if there is to be a grouping facility it should 
be explicit and it should accept a symbol, probably an enum member, to identify 
the group.
   
   A consideration which has arisen multiple times in the present issue and in 
the linked discussion is performance. The focus seems to be mostly on values 
which are expensive to compute. Python will evaluate argument expressions 
before sending them to the logger, and in order to defer the computation we 
either need to add a conditional wrapper, as suggested by @dave2wave, or send a 
function into the logging mechanism itself, i.e. for continuation passing 
style. Both solutions have obvious disadvantages.
   
   On the other hand there has been less focus on the performance of the logger 
string formatting itself. The linked discussion contends that f-strings are 
generally fastest. What has received less to no attention is that this would 
depend on the speed of the `__str__` methods of the parameters, but I think in 
general this should be negligible and we do not need to consider it for ATR or 
in the ASF broadly.
   
   The original issue as opened by @sebbASF only asks that we document our 
decision, but our decision is contingent on our stance on all of the issues 
summarised above. The linked discussion mentions adding delayed f-strings to 
Python, or using [PEP 501](https://peps.python.org/pep-0501/), as a solution to 
some of these issues. PEP 501 eventually inspired t-strings, which were added 
to Python 3.14, released on 7 Oct 2025; we may already be able to migrate to 
Python 3.14. The PEP which introduced t-strings, [PEP 
750](https://peps.python.org/pep-0750/), has an [entire example section about 
structured 
logging](https://peps.python.org/pep-0750/#example-structured-logging).
   
   Although t-strings do not inherently support delayed evaluation, they do 
support a controlled form in that lambda expressions can be used as inline 
values. You can write, for example, `t"this is an {(lambda: 
expensive_value())}"`, and then ensure that your t-string evaluator calls any 
function with zero arguments in its parameter specification only when the 
string is actually being logged. We can define these evaluators as 
`logging.Formatter` subclasses to achieve that, as demonstrated in the example 
logging section in PEP 750.
   
   Using t-strings would also enable a pattern that was suggested by @potiuk 
not here but [in the linked 
discussion](https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802/20):
   
   > Having f-strings, the only way we can actually do that is to get the 
“content” of secrets we want to mask and search the whole logging message to 
see if they are not in the logging message. This is SLOW and brittle. If we 
could use only debugf/infof style of messages, finding out if you are using a 
secret in one of your parameters, could be WAY faster
   
   We could perform scanning of secrets for masking in the t-string evaluator. 
Again, this would be efficient because it would only be called when evaluation 
was strictly necessary for logging. It would not require any change of syntax 
in the calling code, unlike the `(lambda: ...)` syntax required for deferred 
evaluation of variables.
   
   In short, t-strings carry the existing advantages of f-strings but with 
mitigations or solutions for various f-string design problems. The main 
drawback is that they require migration to Python 3.14. I have opened a 
separate issue to track the investigation into the feasibility of this 
migration, #346, but since we will migrate to Python 3.14 eventually whether or 
not it is immediately feasible, I suggest that we consider further evaluation 
of t-strings as a solution to the present issue. We can then link to this 
thread in the documentation to resolve the original issue as stated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to