Re: Update on String Templates (JEP 459)

Guy Steele Tue, 12 Mar 2024 10:54:30 -0700

I think I got my description of (2a) slightly wrong. Let me try again:

—————
(2a) “foo” is only a string, not a string template. In the absence of a special 
conversion, once again we are led to recommend that APIs provide pairs of 
methods, and I think we avoid most of the overloading problems that Maurizio 
has described. I note that instead of


   StringTemplate x = “”;

we could recommend

  StringTemplate x = StringTemplate.EMPTY;

where StringTemplate provides a public static member named EMPTY. We do have 
the burden of explaining to users that “foo” is not a string template.
—————

Now, all that said, I will now provide my best attempt to support the idea that 
(2b) is better than (2a):

It will be difficult to explain the user a design such that the syntax of 
strings appears to be an obvious edge case of the syntax of string templates 
(the case where the number of interpolated expressions is zero) but the 
semantics of strings are not the obvious and analogous edge case of the 
semantics of string templates.

(1) avoids this problem by making the syntaxes different. (2b) avoids the 
problem by making the semantics match. But (2a) totally has this problem.


On Mar 12, 2024, at 1:41 PM, Guy Steele <[email protected]> wrote:

Now that Maurizio has provided a delated explanation of prior investigations 
and some good examples, I am now convinced that the approach of providing a 
special-case conversion from String to StringTemplate is probably not a good 
idea.

Then here is the decision tree that I would suggest:

(1) If we decide that we do want, on its own merits, some up-front visual 
indication that distinguishes string literals from string templates, then it 
becomes easier to just say that strings and string templates are different 
beasts, neither a subtype of the other, and in particular (a) $”foo” (for 
example) is a degenerate string template, which is not the same as the string 
“foo”; (b) $”” is a simple way to write an empty string template, in case you 
need to initialize a variable of type StringTemplate to something non-null; and 
(c) APIs should consider providing pairs of methods, where in each pair one 
takes a String argument and the other takes a StringTemplate argument.

(2) If we decide we do not want that visual distinction, then we have the 
problem of whether “foo” can be used as both a string and a string template.

(2a) “foo” is only a string, not a string template. This leads to some of the 
overloading problems that Maurizio has described, though I note that instead of

   StringTemplate x = “”;

we could recommend

  StringTemplate x = StringTemplate.EMPTY;

where StringTemplate provides a public static member named EMPTY.

(2b) “foo” can be used as both a string and a string template. In the absence 
of a special conversion, this would seem to require that String <: 
StringTemplate as Tagir suggests.


On Mar 12, 2024, at 1:08 PM, Brian Goetz <[email protected]> wrote:

OK, so let's summarize the EG discussion so far.  (As a reminder, syntax-heavy 
features like this are even more subject to "armchair theorization" than most, 
so please, take that into account when commenting.  As a further reminder, the 
best thing we could do right now is write more API code that manipulates string 
templates.)

Overall, I think everyone agrees that the "make string templates the star of 
the show" approach is a winning direction.  No one seems too busted up at the 
loss of processors.

I'm going to try and focus for now on "potential problems that might prompt 
further adjustment", rather than specific solutions.

There is some ambient discomfort that the "sublanguage" of a template becomes a 
dynamic property of a template, introducing new opportunities for users to make 
mistakes with unprocessed templates.  (This was present before as well using 
the RAW processor, but much less prominent.)  But, I don't think this is a 
significant issue, its just something new to get used to.

Most of the concerns have to do with the visual similarity between string 
literals and template literals.  While this is of course intended, there are 
some concerns that they may be "too similar".  Concerns raised include:

 - In a code-generation scenario that leans on templates, sometimes we want to 
use a string literal as a degenerate form of template.  It may be surprising 
that this doesn't "just work", and alternatives (e.g., conversion functions, 
casting, etc) may have varying degrees of discoverability and yuck-factor.

 - Given (a) the visual similarity of string and template literals and (b) the 
lenient treatment of concatenation between strings and everything else, users 
may well be tempted to concatenate string literals with template literals, and 
may be surprised at the outcome.

 - Because template literals may be broad and wide, and their evaluation may 
involve side effects, we may want to give a lexical heads-up of "weird thing 
coming", rather than having template literals be framed more like "strings with 
benefits."

Have I covered the concerns raised so far?

Before we get too caught up in solutions, let's try to get on the same page 
about which of these are problems that need to be solved right now.


(As a small matter of housekeeping, given that the preview train is already 
rolling, we will soon have to make a decision to (a) withdraw the current 
preview entirely, (b) re-preview the current design even though we know it will 
change, or (c) gain the requisite confidence in a new design in time to preview 
that.  From my vantage point, (c) is starting to look increasingly unlikely, 
and I suspect (a) is a better choice than (b).  But I bring this up not to 
start a project management discussions, as much as to raise awareness that 
there are project management constraints.)




On 3/8/2024 1:35 PM, Brian Goetz wrote:

Time to check in with where were are with String Templates.  We’ve gone through 
two rounds of preview, and have received some feedback.

As a reminder, the primary goal of gathering feedback is to learn things about 
the design or implementation that we don’t already know.  This could be bug 
reports, experience reports, code review, careful analysis, novel alternatives, 
etc.    And the best feedback usually comes from using the feature “in anger” — 
trying to actually write code with it.  (“Some people would prefer a different 
syntax” or “some people would prefer we focused on string interpolation only” 
fall squarely in the “things we already knew” camp.)

In the course of using this feature in the `jextract` project, we did learn 
quite a few things we didn’t already know, and this was conclusive enough that 
it has motivated us to adjust our approach in this feature.  Specifically, the 
role of processors is “outsized” to the value they offer, and, after further 
exploration, we now believe it is possible to achieve the goals of the feature 
without an explicit “processor” abstraction at all!  This is a very positive 
development.

First, I want to affirm that that the goals of the project have not changed.  
From JEP 459:

Goals

• Simplify the writing of Java programs by making it easy to express strings 
that include values computed at run time.
• Enhance the readability of expressions that mix text and expressions, whether 
the text fits on a single source line (as with string literals) or spans 
several source lines (as with text blocks).
• Improve the security of Java programs that compose strings from user-provided 
values and pass them to other systems (e.g., building queries for databases) by 
supporting validation and transformation of both the template and the values of 
its embedded expressions.
• Retain flexibility by allowing Java libraries to define the formatting syntax 
used in string templates.
• Simplify the use of APIs that accept strings written in non-Java languages 
(e.g., SQL, XML, and JSON).
• Enable the creation of non-string values computed from literal text and 
embedded expressions without having to transit through an intermediate string 
representation.

Non-Goals
• It is not a goal to introduce syntactic sugar for Java's string concatenation 
operator (+), since that would circumvent the goal of validation.
• It is not a goal to deprecate or remove the StringBuilder and StringBuffer 
classes, which have traditionally been used for complex or programmatic string 
composition.

Another thing that has not changed is our view on the syntax for embedding 
expressions.  While many people did express the opinion of “why not ‘just' do 
what Kotlin/Scala does”, this issue was more than fully explored during the 
initial design round.  (In fact, while syntax disagreements are often purely 
subjective, this one was far more clear — the $-syntax is objectively worse, 
and would be doubly so if injected into an existing language where there were 
already string literals in the wild.  This has all been more than adequately 
covered elsewhere, so I won’t rehash it here.)


Now, let’s talk about what we do think should change: the role of processors 
and the StringTemplate type.

Processors were envisioned as a means to abstract the transformation of 
templates to their final form (whether string, or something else.)  However, 
Java already has a well established means of abstracting behavior: methods.   
(In fact, a processor application can be viewed as merely a new syntax for a 
method call.)  Our experience using the feature highlighted the question: When 
converting a SQL query expressed as a template to the form required by the 
database (such as PreparedStatement), why do we need to say:

  DB.”… template …”

When we could use an ordinary Java library:

  Query q = Query.of(“…template…”)

Indeed, one of the worst things about having processors in the language is that 
API designers are put in the difficult situation of not knowing whether to 
write a processor or an ordinary API, and often have to make that choice before 
the consequences are fully understood.  (To add to this, processors raise 
similar questions at the use site.) But the real criticism here is that 
template capture and processing are complected, when they should be separate, 
composable features.

This motivated us to revisit some of the reasons why processors were so central 
to the initial design in the first place.  And it turned out, this choice had 
been influenced — perhaps overly so — by early implementation experiments.  
(One of the background design goals was to enable expensive operations like 
`String::format` to be (much) cheaper.  Without digressing too deeply on 
performance, String::format can be more than an order of magnitude worse than 
the equivalent concatenation operation, and this in turn sometimes motivates 
developers to use worse idioms for formatting.  The FMT processor brough that 
cost back in line with the equivalent concatenation.)  These early experiments 
biased the design towards needing to know the processor at the point of 
template capture, but upon reexamination we realized that there are other ways 
to achieve the desired performance goals without requiring processors to be 
known at capture time.  This, in turn, enabled us to revisit a point in the 
design space we had transited through earlier, where string templates were 
“just a new kind of literal” and the job performed by processors could instead 
be performed by ordinary APIs.

At this point, a simpler design and implementation emerged that met the 
semantic, correctness, and performance goals: template literals (“Hello 
\{name}”) are simply the literal form of StringTemplate:

  StringTemplate st = “Hello \{name}”;

String and StringTemplate remain unrelated types.  (We explored a number of 
ways to interconvert them, but they caused more trouble than they solved.)  
Processing of string templates, including interpolation, is done by ordinary 
APIs that deal in StringTemplate, aided by some clever implementation tricks to 
ensure good performance.

For APIs where interpolation is known to be safe in the domain, such as 
PrintWriter, APIs can make that choice on behalf of the domain, by providing 
overloads to embody this design choice:

   void println(String) { … }
   void println(StringTemplate) { … interpolate and delegate to println(String) 
…. }

The upshot is that for interpolation-safe APIs like println, we can use a 
template directly without giving up any safety:

   System.out.println(“Hello \{name}”);

In this example, the string template evaluates to StringTemplate, not String 
(no implicit interpolation), and chooses the StringTemplate overload of 
println, which in turn chooses how to process the template.  This stays true to 
the design principle that interpolation is dangerous enough that it should be 
an explicit choice in the code — but it allows that choice to be made by 
libraries when the library is comfortable doing so.

Similarly, the FMT processor is replaced by an overload of String::format that 
interprets templates with embedded format specifiers (e.g., “%d”):

  String format(String formatString, Object… parameters) { … same as today … }
  String format(StringTemplate template) {... equivalent of FMT ...}

And users can call this as:

  String s = String.format(“Hello %12s\{name}”);

Here, the String::format API has chosen to interpret string templates according 
to the rules previously specified in the FMT processor (not ordinary 
interpolation), but that choice is embedded in the library semantics so no 
further explicit choice at the use site is required.  The user already chose to 
pass it to String::format; that’s all the processing selection that is needed.

Where APIs do not express a choice of what template expansion means, users 
continue to be free to process them explicitly before passing them, using APIs 
that do (such as String::format or ordinary interpolation.).

The result is:

- The need for use-site "goop" (previously, the processor name; now, static or 
instance methods to process a template) goes away entirely when dealing with 
libraries that are already template-friendly.
- Even with libraries that require use-site goop, it is no more intrusive than 
before, and can be reduced over time as APIs get with the program.
- StringTemplate is just another type that APIs can support if they want.  The 
"DB" processor becomes an ordinary factory method that accepts a string 
template or an ordinary builder API.
- APIs now can have _more_ control over the timing and meaning of template 
processing, because we are not biasing so strongly towards early processing.
- It becomes easier to abstract over template processing (i.e., combine or 
manipulate templates as templates before processing)
- Interpolation remains an explicit choice, but ST-aware libraries can make 
this choice on behalf of the user.
- The language feature and API surface get considerably smaller, which is good. 
 Core JDK APIs (e.g., println, format, exception constructors) get upgraded to 
work with string templates.

The remaining question that everyone is probably asking is: “so how do we do 
interpolation.”  The answer there is “ordinary library methods”.  This might be 
a static method (String.join(StringTemplate)) or an instance method 
(template.join()), shed to be painted (but please, not right now.).

This is a sketch of direction, so feel free to pose questions/comments on the 
direction.  We’ll discuss the details as we go.

Re: Update on String Templates (JEP 459)

Reply via email to