String literals: some principles

Brian Goetz Sun, 28 Apr 2019 13:33:10 -0700

I would like to point out a key principle that has guided this second round of 
exploration on string literals, and mention how it might guide the next round 
(without actually diving into that round.).


Classic string literals, and the new “fat” string literals — are now 
recognizable as variations on the same feature, each adapted to their niche 
(single vs multi-line.). The “escape language” supported by both is identical — 
and should stay that way — the only difference is the delimiter, and the 
handling of artifacts of embedding a snippet of foreign text in a 
traditionally-indented Java program.  (Even their delimiters are similar.). 
This is a good thing.

Looking ahead to the next round, we can build on this.  In the first round, we 
mistakenly thought that there was something that could reasonably be called a 
“raw” string, but this notion is a fantasy; no string literal is so raw that it 
can’t recognize its closing delimiter.  So “rawness” is really only a matter of 
degree.  

We can characterize a string literal language as:

 - Opening delimiter
 - Closing delimiter
 - Escape characters, if any
 - Escape sublanguages, if any

That is, we process ordinary characters until we encounter either the closing 
delimiter, or one of the escape characters.  When we encounter an escape 
character, we process a “program” from the escape language, and then go back to 
processing ordinary characters.  

Classic string literals have opening and closing delimiters of “, an escape 
character of \, and an escape language that includes “programs” like:

    n — newline
    t — tab
    0nnn — octal literal
    “ — quote character

Fat string literals are the same, except that the opening and closing delimiter 
are “””.  But we keep the same escape language.  This is valuable.  

It is worth asking explicitly: do we want to keep the same escape character 
too?  Guy has suggested offline that we might consider \\\ as the escape 
character for fat strings.  

Looking ahead (but please, let’s not open this discussion now), one of the 
tools we have at hand for representing degrees of “raw-ness” is, as we 
“strengthen" the delimiter, we also strenghten the escape character at the same 
rate — but keep the escape language intact.  This would allow raw strings to be 
yet another projection of the same basic string literal feature, while 
requiring increasingly explicit action on the part of the user to access the 
escape language.  

I bring this up not because I want to talk about raw-ness now (getting the 
hint?), but because I want to keep all the variations of string literals as 
lightly-varying projections of the same basic feature.  It has come up, for 
example, that we might treat \<newline> differently in ML strings as in classic 
strings, but I would prefer it we could not tinker with the escape language in 
nonuniform ways — as this minimizes the variations between the various 
sub-features.  So I offer this peek down the road as a means of 

Soliciting discussion on the pros and cons of keeping \ as our escape character.

String literals: some principles

Reply via email to