Hi Remi,

Yes, I think this is a good way to think about the design space. (It is a shame 
that the fact that this is NOT about string interpolation, but something much 
more general and focused on security - even though made explicit in the JEP - 
has been lost in some of the wider discussions.)

You can make the distinction even clearer - reading from the spec - a template 
"\{x} + \{y}” can be thought of as sugar for the expression new 
$HiddenClassImplementsStringTemplate(List.of("", " + ", ""), List.of(x, y)). 
So, sure, it’s an object that has the potential to be a string, but it’s an 
object with a couple of lists in it. The fact that the embedded values are kept 
as a separate list, and so can be validated and dealt with using 
domain-specific logic, is the key to safety. You need to write code to 
transform template values into something else (perhaps a string). In the old 
model, that was the role of the processor (and the reason why they came first - 
to remind you that the template needed processing to get a value), and with the 
new model will be a method. I agree with you that any design that makes it easy 
to conflate templates with strings is a road to another 30+ years of injection 
attacks.

Gavin


On 16 Mar 2024, at 07:18, Remi Forax <[email protected]> wrote:



________________________________
From: "Maurizio Cimadamore" <[email protected]>
To: "Guy Steele" <[email protected]>
Cc: "amber-spec-experts" <[email protected]>
Sent: Friday, March 15, 2024 5:31:28 PM
Subject: Re: Update on String Templates (JEP 459)

Hi

On 15/03/2024 16:07, Guy Steele wrote:

Then again, now that I ponder the space of use cases, it may be that, despite 
my initial enthusiasm, having a separate string interpolation syntax may not 
carry its weight if its uses are relatively rare. We always have the option of 
using a string template and then applying an interpolation processor (which 
might be spelled `String.of(<template>)` or `(<template>).interpolate()` or 
some other way), and about all we lose from that approach is the ability to use 
string interpolation to specify a constant expression—for which we still have 
the old-fashioned alternative of using `+` concatenation. If we drop string 
interpolation, we can then drop the INTERPOLATION prefix, and we are back to a 
single-prefix model, and the remaining question is whether that prefix is 
optional, at least in some cases. Okay, I think I now have a better 
understanding of the relationships among the various proposals in the design 
space. Thanks for your patience.

I think the advantage for not having a string interpolation prefix, is that 
then interpolation is “just another processor” e.g. a static method somewhere 
that takes a string template and returns a String. Another String::format, in a 
way. So that leads to a rather uniform design.


And now that I have that better understanding, I think I lean toward (a) 
abandoning string interpolation and (b) having a single, short, _non-optional_ 
prefix for templates (“$” would be a plausible choice), on the grounds that I 
think it makes code more readable if templates are always distinguished up 
front from strings—and this is especially helpful when the templates are rather 
long and any `\{` present might be far from the beginning. It has a minimal 
number of cases to explain:

“…”      string literal, must not contain \{…}, type String
$”…”    template literal, may contain \{…}, type StringTemplate

Yep, I agreee this a very principled way to look at the problem.

[...]

This is how i like to explain the design space to myself.
We have two kind of strings, tainted string and untainted string (this is not 
new, see [1]).
An untainted string is a string that can be escaped properly, in our case a 
StringTemplate. A tainted string is just a String.

We do not want a String to be a StringTemplate, because it means all untainted 
strings are tainted strings.
We do not want a StringTemplate to be a String, because it means that all 
tainted strings are untainted strings.
So both are different types, with neither a subtype relationship nor an 
automatic conversion between them.

For the literals, we need two different constructs otherwise we will have a 
conversion between tainted and untainted strings,
we also need the literal to construct an untainted string to be different and 
upfront to easily distinguish an untainted string from a tainted string, so
- "..." constructs a String, a tainted string,
- TEMPLATE"..." constructs a StringTemplate, an untainted string.

About string interpolation, this is another way to create a String and this is 
not directly related to a string being tainted or not, so it's a kind of 
orthogonal in term of design.
It can not be a prefix like INTERPOLATE, because this is different in nature 
from TEMPLATE, TEMPLATE creates another kind of String, interpolation creates 
just a String.
Having a static method (a processor) that creates a String from a 
StringTemplate creates a common conduit to get a tainted string from any 
untainted strings, which makes the distinction between untainted string and 
tainted string less relevant. So i would advise to not go in that direction.


Maurizio

​

Rémi

[1] https://en.wikipedia.org/wiki/Taint_checking


Reply via email to