The backslash prefix makes a lot of sense to me. Creating scenarios where I 
needed to toggle the raw-ness seemed forced.

The only awkwardness I see is with leading/trailing quotes. 

    """\"Cooked\""""
    """ "Raw" """.strip() or """"Raw""""

Cooked is fine wth escapes. Raw could have a rule like; any quotes after/before 
the opening/closing TQ sequence get added to the string. 

— Jim


Sent from my iPhone

> On Jan 6, 2019, at 1:43 PM, Brian Goetz <brian.go...@oracle.com> wrote:
> 
> As Reinier pointed out on amber-dev, regex strings may routinely contain 
> escaped meta-characters — +, *, brackets, etc.   So the embedded \- and \+ 
> story has an obvious conflict.  While these are not the only possible 
> characters for such “shift” operators, his point that this might be overkill 
> is a good one.  So let’s look at options for denoting raw-ness.  
> 
> - Just make triple-quote strings always raw as well as multi-line-capable; 
> regexes and friends would use TQ strings even though they are single line 
> (Scala, Kotlin)
> - Letter prefix, such as R”…” (C++, Rust)
> - Symbol prefix, such as @“…” (C#), or \”…” (suggestive of “distributing” the 
> escaping across the string.)  
> - Embedded escape sequence that switches to raw mode, but can’t be switched 
> back: “\+raw string”, “\{raw}raw string”.  
> 
> Data from Google suggests that, in their code base, on the order of 5% of 
> candidates for multi-line strings use some escape sequences (Kevin/Liam, can 
> you verify?)  This suggests to me that the “just use TQ” approach is vaguely 
> workable, but likely to be error-prone (5% is infrequently enough that people 
> will say \t when they mean tab and discover this at runtime, and then have to 
> go back and add a .escape() call.)  
> 
> (Of these, my current favorite is using the backslash: “cooked”, “””cooked 
> and ML-capable”, \”raw”, \”””raw and ML capable”.  The use of \ suggests “the 
> backslashes have been pre-added for you”, building on existing associations 
> with backslash.)
> 
> Are there other credible candidates that I’ve missed?  
> 
> 
> 
>> On Jan 2, 2019, at 2:00 PM, Jim Laskey <james.las...@oracle.com> wrote:
>> 
>> 
>>> 
>>> http://cr.openjdk.java.net/~jlaskey/Strings/RTL2/index.html 
>>> <http://cr.openjdk.java.net/~jlaskey/Strings/RTL2/index.html>
>>> http://cr.openjdk.java.net/~jlaskey/Strings/RTL2.pdf 
>>> <http://cr.openjdk.java.net/~jlaskey/Strings/RTL2.pdf>
>>> First of all, I would like to apologize for leading us down the garden path 
>>> re Java Raw String Literals. I jumped into this feature fully enamoured 
>>> with the JavaScript equivalent and, "why can't we have this in Java?"  As 
>>> the proposal evolved, it became clear that what we came up with was not a 
>>> good Java solution. I underestimated the concern that the original proposal 
>>> was too left field and did not fit into Java very well. It's somewhat 
>>> ironic that the backtick looks like a thorn.
>>> 
>>> So, let's start the new year with a structured approach to the enhance 
>>> string literal design. Brian gave a summary of why the old design fails. 
>>> Starting with this summary, Brian and I talked out a series of critical 
>>> decision points that should be given thought, if not answers, before we 
>>> propose a new design. As an exercise, I supplemented these points and 
>>> created a series of small decision trees (a full on decision tree would be 
>>> complex and not very helpful.) I found these trees good intuition pumps for 
>>> getting the design at least 80% there. Hopefully, this exercise will help 
>>> you in the same way.
>>> 
>>> 
>>> 
>>> 
>>> Even the label Raw String Literal put the emphasis on the wrong part of the 
>>> feature. What developers really want is multi-line strings. They want to be 
>>> able to paste alien source into their Java programs with as little fuss as 
>>> possible.
>>> 
>>> String raw-ness (not translating escapes) is a tangential aspect, that may 
>>> or may not be needed to implement multi-line strings. Yes, the regex and 
>>> Window's file path arguments in JEP 326 are still valid, but this aspect 
>>> needs to be separated from the main part of the design. Further in the 
>>> discussion, we'll see that raw-ness is really a many-headed hydra, best 
>>> slain one head at a time.
>>> 
>>> 
>>> 
>>> 
>>> We have to be honest. We know Java's primary market. Sure we want to embed 
>>> Java in Java for writing tests. Sure there is JavaScript and CSS in web 
>>> pages. Nevertheless, most uses of multi-line will be for non-complex 
>>> grammars. Specifically, grammars that don't require special handling of 
>>> multi-character delimiter sequences. If you can accept this, then the 
>>> solution set is much smaller.
>>> 
>>> 
>>> 
>>> 
>>> This is an easy one. Familiarity is key to feature education. Radical 
>>> wandering off with new syntax is not helpful to anyone but bloggers and 
>>> authors.
>>> 
>>> 
>>> 
>>> 
>>> If you buy into the familiarity argument, then double quote is really only 
>>> choice for a delimiter. Double quote already indicates a string literal. 
>>> Single quote indicates a character. We don’t want to gratuitously burn 
>>> unused symbols like backtick. Backslash works for regex but maybe not for 
>>> others. Combinations and nonces just introduce new noise when our original 
>>> goal was to reduce noise and complexity.
>>> 
>>> 
>>> 
>>> 
>>> Other languages avoid delimiter escape sequences by doubling up. Example, 
>>> "abc""def" -> abc"def. This concept is unfamiliar to Java developers, why 
>>> change now. Escape sequences are what we know.
>>> 
>>> 
>>> 
>>> 
>>> Language designers got very nervous when I suggested infinite delimiter 
>>> sequences in the original proposal; lexically sacrilegious. I felt strongly 
>>> that it was easy to explain and only 1 in 1M developers would ever use more 
>>> than 4-5 character delimiter sequences. In round two, I have come to agree. 
>>> This was taking on more complexity than is really warranted, for a use case 
>>> that doesn’t come along very often. I suggest we only need single and 
>>> triple double quotes. A single double quote works today, so no argument 
>>> there. Double double quotes means empty string, no problem. Triple double 
>>> quotes are only necessary to avoid having to escape quotes in alien source.
>>> 
>>> String json = """
>>>               {
>>>                 "name": "Jean Smith",
>>>                 "age": 32,
>>>                 "location": "San Jose"
>>>               }
>>>             """;
>>> 
>>> versus
>>> 
>>> String json = "
>>>               {
>>>                 \"name\": \"Jean Smith\",
>>>                 \"age\": 32,
>>>                 \"location\": \"San Jose\"
>>>               }
>>>             ";
>>> 
>>> This second case is where we wandered off the tracks with raw-ness. We 
>>> assumed raw-ness is necessary to avoid all the backslashes. Most cases can 
>>> be handled with triple double quotes.
>>> 
>>> Okay, so why not more combinations? Simply because, most of the time they 
>>> are not needed. On the rare occasion we do have nested triple double 
>>> quotes, we can then use escape sequences.
>>> 
>>> String nestedJSON = """
>>>                       \"\"\"
>>>                       {
>>>                         "name": "Jean Smith",
>>>                         "age": 32,
>>>                         "location": "San Jose"
>>>                       }
>>>                       \"\"\";
>>>                   """;
>>> 
>>> or better yet, you only have to escape every third double quote
>>> 
>>> String nestedJSON = """
>>>                       \"""
>>>                       {
>>>                         "name": "Jean Smith",
>>>                         "age": 32,
>>>                         "location": "San Jose"
>>>                       }
>>>                       \""";
>>>                   """;
>>> 
>>> Not so evil and it's familiar.
>>> 
>>> 
>>> 
>>> 
>>> Meaning, you can only use single quotes for simple strings and triple 
>>> quotes for multi-line strings. I don't have a strong opinion other than it 
>>> seems like an unneeded restriction. The only argument I've heard has been 
>>> for better error recovery when missing a close delimiter during parsing. My 
>>> counter for that argument is that if you are processing multi-line strings 
>>> then you can easily track the first newline after the opening delimiter and 
>>> recover from there. I implemented that recovery in javac and worked out 
>>> well.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Cooked (translated escape sequences) should be the default. Why should a 
>>> multi-line string be different than a simple string?  We have a solution 
>>> for embedding double quote. Single quotes don't require escaping. Tabs and 
>>> newlines can exist as is. Unicode characters can be either an escape 
>>> sequence or the unicode character. So the only problem case is backslash. I 
>>> would argue that the rare backslash can be escaped. If not, then the 
>>> developer can use the raw-ness solution.
>>> 
>>> 
>>> 
>>> 
>>> If we don't translate newlines, then source is not transferable across 
>>> platforms. That is, a source from one platform may not execute the same way 
>>> on another platform. Translating consistently guarantees execution 
>>> consistency. As a note, programming languages that didn't translate 
>>> newlines in multi-line string literals typically regretted it later 
>>> (Python.)
>>> 
>>> 
>>> 
>>> 
>>> With the original Raw String Literal proposal, there was concern about 
>>> leading and trailing nested delimiters. If we default to cooked strings, 
>>> then we use can use \".
>>> 
>>> 
>>> 
>>> 
>>> These questions have been answered numerous times and fall into the realm 
>>> of library support. Same arguments as before, same outcome.
>>> 
>>> 
>>> To summarize the bold paths at this point;
>>>    - multi-line strings are an extension of traditional simple strings
>>>    - newlines in a string are no longer an error and the string can extend 
>>> across several lines
>>>    - error recovery can pick up at the first newline after the opening 
>>> delimiter 
>>>    - multi-line strings process escape sequences (including unicode) in the 
>>> same way as simple strings
>>>    - multiple double quotes are handled with escape sequences
>>>    - triple double quote delimiter is introduced to avoid escaping simple 
>>> double quote sequences
>>>    
>>> Generally, I think this is very much in the traditional Java spirit.
>>> 
>>> 
>>> Now, let's move on to the lesser but more interesting issue. As I stated 
>>> above, raw-ness is a multi-headed beast. Raw-ness involves the turning off 
>>> the translation of 
>>>    - escape sequences
>>>    - unicode escapes
>>>    - delimiter sequences
>>>    - escape sequence prefix (backslash)
>>>    - tabs and newlines (control characters in general)
>>> 
>>> Sometimes we need all of the translations, sometimes few and sometimes 
>>> none. In the multi-line discussion above, we see we don't need raw as much 
>>> as we might have expected. Maybe for occasional backslashes, as in regex 
>>> and Windows paths strings.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The original Raw String Literal proposal suggested that raw-ness was a 
>>> property of the whole string literal and thus we proposed an alternate 
>>> delimiter syntax just to emphasize that fact. If we accept the bold path of 
>>> multi-line discussion above, then alternate delimiter is out. This leaves 
>>> prefixing as the best option to bless a string literal with raw-ness. 
>>> 
>>> At this point, I would like to suggest an alternate, maybe progressive way 
>>> to think of raw-ness. Since the original proposal, I have been thinking of 
>>> raw-ness as a state of processing the literal. State is certainly obvious 
>>> in the scanner implementation, why not raise that to the language level? If 
>>> it is a state then we should be able to enter and leave that state in some 
>>> way. Escape sequences are an obvious way of transitioning translation in 
>>> the string. \- and \+ are available and not currently recognized as valid 
>>> escape sequences, why not \- and \+ to toggle escape processing?
>>> 
>>>   String a = "cooked \-raw\+ cooked";  // cooked raw cooked - a little odd 
>>> but not so much so
>>>   String b = "abc\-\\\\\+def";         // abc\\\\def - struggling
>>>   String c = "\-abc\\\\def";           // abc\\\\def - more readable as an 
>>> inner prefix
>>>   String d = "abc\-\-def\+\+ghi";      // abc\-def\+ghi - raw on "\-" is 
>>> "\" and "-", raw off "\+" is "\" and "+"
>>>   String e = """\-"abc"\+""";          // "abc" - \- and \+ act a no-ops of 
>>> sorts
>>> 
>>> Comparing property vs state:
>>> 
>>>   Runtime.getRuntime().exec(R""" "C:\Program Files\foo" bar""".strip());
>>>   Runtime.getRuntime().exec("""\-"C:\Program Files\foo" bar""");
>>> 
>>>   System.out.println("this".matches(R"\w\w\w\w"));
>>>   System.out.println("this".matches("\-\w\w\w\w"));
>>> 
>>>   String html = R"""
>>>                      <html>
>>>                          <body>
>>>                              <p>Hello World.</p>
>>>                          </body>
>>>                      </html>
>>>                 """.align();
>>>   String html = """\-
>>>                      <html>
>>>                          <body>
>>>                              <p>Hello World.</p>
>>>                          </body>
>>>                      </html>
>>>                 """.align();
>>> 
>>> 
>>>   String nested = """
>>>                       String EXAMPLE_TEST = "This is my small example "
>>>                               + "string which I'm going to "
>>>                               + "use for pattern matching.";
>>>                   """ +
>>>                   R"""
>>>                       System.out.println(EXAMPLE_TEST.replaceAll("\\s+", 
>>> "\t"));
>>>                   """;
>>>   String nested = """
>>>                       String EXAMPLE_TEST = "This is my small example "
>>>                               + "string which I'm going to "
>>>                               + "use for pattern matching.";
>>>                       \-
>>>                       System.out.println(EXAMPLE_TEST.replaceAll("\\s+", 
>>> "\t"));
>>>                       \+
>>>                   """;
>>> 
>>> Hopefully, this is a good starting point for discussion. As before, I'm 
>>> pragmatic about which direction we go, so feel free to comment.
>>> 
>>> Cheers,
>>> 
>>> -- Jim
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 

Reply via email to