Let's pull on this string some more. Assuming we settled on disjoint
types and syntaxes, with no magic conversions, what library support do
we need directly for ST? I am thinking (please, let's focus on the
functionality before we nitpick the names):
// on String
static join(StringTemplate) // previously STR
// on StringTemplate
String join() // STR,
instance/suffix version
static StringTemplate join(StringTemplate...) // + for string templates
This is a pleasantly short set; is anything missing? (Not addressing
the "which things were previously processors, but now need API points"
right now -- that's a separate discussion.)
On 3/18/2024 9:38 AM, Brian Goetz wrote:
I think this has been a good discussion, and it looks like we're
starting to see some convergence.
I think we keep trying to exploit ambiguity / implicitness, and it
doesn't go well:
- Many users want STR to be the "implicit processor", but that isn't
good for security
- We tried reusing the String delimiters for string templates to
reduce the perception of how many different things there are here, but
that creates cognitive load (can't tell strings from templates without
parsing the entire contents), among other problems
- We tried making String a poly expression (and other tricks) to
reduce the number of explicit conversions, but that created problems too
John's characterization captures the feeling and eventual conclusion
that I think many of us share:
I kind of like Guy’s offensive-to-everyone suggestion that $ is required to
make a true ST.
Indeed, my first reaction to the $ sigil was "please no", but I am
grudgingly coming to the conclusion that we should stop trying to
implicitly "just figure out what the user wants" and acknowledge the
reality: templates are not strings, strings are not templates, and
they can be converted to each other with ... methods, just like any
other relatable types. So string literals are as they always were;
string templates are a new thing, whose syntax and type is disjoint
from that of strings, as Guy also seems to be converging on:
And now that I have that better understanding, I think I lean toward
(a) abandoning string interpolation and (b) having a single, short,
_non-optional_ prefix for templates (“$” would be a plausible
choice), on the grounds that I think it makes code more readable if
templates are always distinguished up front from strings—and this is
especially helpful when the templates are rather long and any `\{`
present might be far from the beginning. It has a minimal number of
cases to explain:
“…” string literal, must not contain \{…}, type String
$”…” template literal, may contain \{…}, type StringTemplate
(concrete syntax TBB (to be bikeshod), along with the spellings of S
-> ST and ST -> S.)
Some more useful observations:
- The toString behavior cannot be mere interpolation. Besides the
principled objections and inevitable propping-open-the-security-door
that this would lead to, people will quickly learn to abuse "" + ST as
the "fewest characters required" way to get interpolation, which is
"clever" in the same way that John's "empty \{}" trick is clever, but
not good for clarity.
- We need a story to tell for how to write good overloads, which
seems to be more subtle than initially thought.
- If the only way to make a StringTemplate is the literal syntax,
then STs gain a valuable security property: all fragments in the ST
are strings that appeared literally in code, and therefore untainted.
This is probably too restrictive but we should be aware of what we are
giving up as we explore the API options.
- Processors should be encouraged to "flatten" embedded STs.
A few people have implied that only the tainted parts of an ST (the
embedded expressions) need special processing, but I'll point out that
the untainted parts may often require domain-specific validation. For
example, a ST representing a SQL query wants balanced quotes, and
might want to require quotes around embedded expressions.
On 3/8/2024 1:35 PM, Brian Goetz wrote:
Time to check in with where were are with String Templates. We’ve
gone through two rounds of preview, and have received some feedback.
As a reminder, the primary goal of gathering feedback is to learn
things about the design or implementation that we don’t already know.
This could be bug reports, experience reports, code review, careful
analysis, novel alternatives, etc. And the best feedback usually
comes from using the feature “in anger” — trying to actually write
code with it. (“Some people would prefer a different syntax” or “some
people would prefer we focused on string interpolation only” fall
squarely in the “things we already knew” camp.)
In the course of using this feature in the `jextract` project, we did
learn quite a few things we didn’t already know, and this was
conclusive enough that it has motivated us to adjust our approach in
this feature. Specifically, the role of processors is “outsized” to
the value they offer, and, after further exploration, we now believe
it is possible to achieve the goals of the feature without an
explicit “processor” abstraction at all! This is a very positive
development.
First, I want to affirm that that the goals of the project have not
changed. From JEP 459:
Goals
• Simplify the writing of Java programs by making it easy to express
strings that include values computed at run time.
• Enhance the readability of expressions that mix text and
expressions, whether the text fits on a single source line (as with
string literals) or spans several source lines (as with text blocks).
• Improve the security of Java programs that compose strings from
user-provided values and pass them to other systems (e.g., building
queries for databases) by supporting validation and transformation of
both the template and the values of its embedded expressions.
• Retain flexibility by allowing Java libraries to define the
formatting syntax used in string templates.
• Simplify the use of APIs that accept strings written in non-Java
languages (e.g., SQL, XML, and JSON).
• Enable the creation of non-string values computed from literal text
and embedded expressions without having to transit through an
intermediate string representation.
Non-Goals
• It is not a goal to introduce syntactic sugar for Java's string
concatenation operator (+), since that would circumvent the goal of
validation.
• It is not a goal to deprecate or remove the StringBuilder and
StringBuffer classes, which have traditionally been used for complex
or programmatic string composition.
Another thing that has not changed is our view on the syntax for
embedding expressions. While many people did express the opinion of
“why not ‘just' do what Kotlin/Scala does”, this issue was more than
fully explored during the initial design round. (In fact, while
syntax disagreements are often purely subjective, this one was far
more clear — the $-syntax is objectively worse, and would be doubly
so if injected into an existing language where there were already
string literals in the wild. This has all been more than adequately
covered elsewhere, so I won’t rehash it here.)
Now, let’s talk about what we do think should change: the role of
processors and the StringTemplate type.
Processors were envisioned as a means to abstract the transformation
of templates to their final form (whether string, or something else.)
However, Java already has a well established means of abstracting
behavior: methods. (In fact, a processor application can be viewed
as merely a new syntax for a method call.) Our experience using the
feature highlighted the question: When converting a SQL query
expressed as a template to the form required by the database (such as
PreparedStatement), why do we need to say:
DB.”… template …”
When we could use an ordinary Java library:
Query q = Query.of(“…template…”)
Indeed, one of the worst things about having processors in the
language is that API designers are put in the difficult situation of
not knowing whether to write a processor or an ordinary API, and
often have to make that choice before the consequences are fully
understood. (To add to this, processors raise similar questions at
the use site.) But the real criticism here is that template capture
and processing are complected, when they should be separate,
composable features.
This motivated us to revisit some of the reasons why processors were
so central to the initial design in the first place. And it turned
out, this choice had been influenced — perhaps overly so — by early
implementation experiments. (One of the background design goals was
to enable expensive operations like `String::format` to be (much)
cheaper. Without digressing too deeply on performance,
String::format can be more than an order of magnitude worse than the
equivalent concatenation operation, and this in turn sometimes
motivates developers to use worse idioms for formatting. The FMT
processor brough that cost back in line with the equivalent
concatenation.) These early experiments biased the design towards
needing to know the processor at the point of template capture, but
upon reexamination we realized that there are other ways to achieve
the desired performance goals without requiring processors to be
known at capture time. This, in turn, enabled us to revisit a point
in the design space we had transited through earlier, where string
templates were “just a new kind of literal” and the job performed by
processors could instead be performed by ordinary APIs.
At this point, a simpler design and implementation emerged that met
the semantic, correctness, and performance goals: template literals
(“Hello \{name}”) are simply the literal form of StringTemplate:
StringTemplate st = “Hello \{name}”;
String and StringTemplate remain unrelated types. (We explored a
number of ways to interconvert them, but they caused more trouble
than they solved.) Processing of string templates, including
interpolation, is done by ordinary APIs that deal in StringTemplate,
aided by some clever implementation tricks to ensure good performance.
For APIs where interpolation is known to be safe in the domain, such
as PrintWriter, APIs can make that choice on behalf of the domain, by
providing overloads to embody this design choice:
void println(String) { … }
void println(StringTemplate) { … interpolate and delegate to
println(String) …. }
The upshot is that for interpolation-safe APIs like println, we can
use a template directly without giving up any safety:
System.out.println(“Hello \{name}”);
In this example, the string template evaluates to StringTemplate, not
String (no implicit interpolation), and chooses the StringTemplate
overload of println, which in turn chooses how to process the
template. This stays true to the design principle that interpolation
is dangerous enough that it should be an explicit choice in the code
— but it allows that choice to be made by libraries when the library
is comfortable doing so.
Similarly, the FMT processor is replaced by an overload of
String::format that interprets templates with embedded format
specifiers (e.g., “%d”):
String format(String formatString, Object… parameters) { … same as
today … }
String format(StringTemplate template) {... equivalent of FMT ...}
And users can call this as:
String s = String.format(“Hello %12s\{name}”);
Here, the String::format API has chosen to interpret string templates
according to the rules previously specified in the FMT processor (not
ordinary interpolation), but that choice is embedded in the library
semantics so no further explicit choice at the use site is required.
The user already chose to pass it to String::format; that’s all the
processing selection that is needed.
Where APIs do not express a choice of what template expansion means,
users continue to be free to process them explicitly before passing
them, using APIs that do (such as String::format or ordinary
interpolation.).
The result is:
- The need for use-site "goop" (previously, the processor name; now,
static or instance methods to process a template) goes away entirely
when dealing with libraries that are already template-friendly.
- Even with libraries that require use-site goop, it is no more
intrusive than before, and can be reduced over time as APIs get with
the program.
- StringTemplate is just another type that APIs can support if they
want. The "DB" processor becomes an ordinary factory method that
accepts a string template or an ordinary builder API.
- APIs now can have _more_ control over the timing and meaning of
template processing, because we are not biasing so strongly towards
early processing.
- It becomes easier to abstract over template processing (i.e.,
combine or manipulate templates as templates before processing)
- Interpolation remains an explicit choice, but ST-aware libraries
can make this choice on behalf of the user.
- The language feature and API surface get considerably smaller,
which is good. Core JDK APIs (e.g., println, format, exception
constructors) get upgraded to work with string templates.
The remaining question that everyone is probably asking is: “so how
do we do interpolation.” The answer there is “ordinary library
methods”. This might be a static method
(String.join(StringTemplate)) or an instance method
(template.join()), shed to be painted (but please, not right now.).
This is a sketch of direction, so feel free to pose
questions/comments on the direction. We’ll discuss the details as we
go.