Hi Remi,
Yes, I think this is a good way to think about the design space. (It is a shame
that the fact that this is NOT about string interpolation, but something much
more general and focused on security - even though made explicit in the JEP -
has been lost in some of the wider discussions.)
You can make the distinction even clearer - reading from the spec - a template
"\{x} + \{y}” can be thought of as sugar for the expression new
$HiddenClassImplementsStringTemplate(List.of("", " + ", ""), List.of(x, y)).
So, sure, it’s an object that has the potential to be a string, but it’s an
object with a couple of lists in it. The fact that the embedded values are kept
as a separate list, and so can be validated and dealt with using
domain-specific logic, is the key to safety. You need to write code to
transform template values into something else (perhaps a string). In the old
model, that was the role of the processor (and the reason why they came first -
to remind you that the template needed processing to get a value), and with the
new model will be a method. I agree with you that any design that makes it easy
to conflate templates with strings is a road to another 30+ years of injection
attacks.
Gavin
On 16 Mar 2024, at 07:18, Remi Forax <[email protected]> wrote:
________________________________
From: "Maurizio Cimadamore" <[email protected]>
To: "Guy Steele" <[email protected]>
Cc: "amber-spec-experts" <[email protected]>
Sent: Friday, March 15, 2024 5:31:28 PM
Subject: Re: Update on String Templates (JEP 459)
Hi
On 15/03/2024 16:07, Guy Steele wrote:
Then again, now that I ponder the space of use cases, it may be that, despite
my initial enthusiasm, having a separate string interpolation syntax may not
carry its weight if its uses are relatively rare. We always have the option of
using a string template and then applying an interpolation processor (which
might be spelled `String.of(<template>)` or `(<template>).interpolate()` or
some other way), and about all we lose from that approach is the ability to use
string interpolation to specify a constant expression—for which we still have
the old-fashioned alternative of using `+` concatenation. If we drop string
interpolation, we can then drop the INTERPOLATION prefix, and we are back to a
single-prefix model, and the remaining question is whether that prefix is
optional, at least in some cases. Okay, I think I now have a better
understanding of the relationships among the various proposals in the design
space. Thanks for your patience.
I think the advantage for not having a string interpolation prefix, is that
then interpolation is “just another processor” e.g. a static method somewhere
that takes a string template and returns a String. Another String::format, in a
way. So that leads to a rather uniform design.
And now that I have that better understanding, I think I lean toward (a)
abandoning string interpolation and (b) having a single, short, _non-optional_
prefix for templates (“$” would be a plausible choice), on the grounds that I
think it makes code more readable if templates are always distinguished up
front from strings—and this is especially helpful when the templates are rather
long and any `\{` present might be far from the beginning. It has a minimal
number of cases to explain:
“…” string literal, must not contain \{…}, type String
$”…” template literal, may contain \{…}, type StringTemplate
Yep, I agreee this a very principled way to look at the problem.
[...]
This is how i like to explain the design space to myself.
We have two kind of strings, tainted string and untainted string (this is not
new, see [1]).
An untainted string is a string that can be escaped properly, in our case a
StringTemplate. A tainted string is just a String.
We do not want a String to be a StringTemplate, because it means all untainted
strings are tainted strings.
We do not want a StringTemplate to be a String, because it means that all
tainted strings are untainted strings.
So both are different types, with neither a subtype relationship nor an
automatic conversion between them.
For the literals, we need two different constructs otherwise we will have a
conversion between tainted and untainted strings,
we also need the literal to construct an untainted string to be different and
upfront to easily distinguish an untainted string from a tainted string, so
- "..." constructs a String, a tainted string,
- TEMPLATE"..." constructs a StringTemplate, an untainted string.
About string interpolation, this is another way to create a String and this is
not directly related to a string being tainted or not, so it's a kind of
orthogonal in term of design.
It can not be a prefix like INTERPOLATE, because this is different in nature
from TEMPLATE, TEMPLATE creates another kind of String, interpolation creates
just a String.
Having a static method (a processor) that creates a String from a
StringTemplate creates a common conduit to get a tainted string from any
untainted strings, which makes the distinction between untainted string and
tainted string less relevant. So i would advise to not go in that direction.
Maurizio
Rémi
[1] https://en.wikipedia.org/wiki/Taint_checking