> 
> http://cr.openjdk.java.net/~jlaskey/Strings/RTL2/index.html 
> <http://cr.openjdk.java.net/~jlaskey/Strings/RTL2/index.html>
> http://cr.openjdk.java.net/~jlaskey/Strings/RTL2.pdf 
> <http://cr.openjdk.java.net/~jlaskey/Strings/RTL2.pdf>
> First of all, I would like to apologize for leading us down the garden path 
> re Java Raw String Literals. I jumped into this feature fully enamoured with 
> the JavaScript equivalent and, "why can't we have this in Java?"  As the 
> proposal evolved, it became clear that what we came up with was not a good 
> Java solution. I underestimated the concern that the original proposal was 
> too left field and did not fit into Java very well. It's somewhat ironic that 
> the backtick looks like a thorn.
> 
> So, let's start the new year with a structured approach to the enhance string 
> literal design. Brian gave a summary of why the old design fails. Starting 
> with this summary, Brian and I talked out a series of critical decision 
> points that should be given thought, if not answers, before we propose a new 
> design. As an exercise, I supplemented these points and created a series of 
> small decision trees (a full on decision tree would be complex and not very 
> helpful.) I found these trees good intuition pumps for getting the design at 
> least 80% there. Hopefully, this exercise will help you in the same way.
> 
> 
> 
> 
> Even the label Raw String Literal put the emphasis on the wrong part of the 
> feature. What developers really want is multi-line strings. They want to be 
> able to paste alien source into their Java programs with as little fuss as 
> possible.
> 
> String raw-ness (not translating escapes) is a tangential aspect, that may or 
> may not be needed to implement multi-line strings. Yes, the regex and 
> Window's file path arguments in JEP 326 are still valid, but this aspect 
> needs to be separated from the main part of the design. Further in the 
> discussion, we'll see that raw-ness is really a many-headed hydra, best slain 
> one head at a time.
> 
> 
> 
> 
> We have to be honest. We know Java's primary market. Sure we want to embed 
> Java in Java for writing tests. Sure there is JavaScript and CSS in web 
> pages. Nevertheless, most uses of multi-line will be for non-complex 
> grammars. Specifically, grammars that don't require special handling of 
> multi-character delimiter sequences. If you can accept this, then the 
> solution set is much smaller.
> 
> 
> 
> 
> This is an easy one. Familiarity is key to feature education. Radical 
> wandering off with new syntax is not helpful to anyone but bloggers and 
> authors.
> 
> 
> 
> 
> If you buy into the familiarity argument, then double quote is really only 
> choice for a delimiter. Double quote already indicates a string literal. 
> Single quote indicates a character. We don’t want to gratuitously burn unused 
> symbols like backtick. Backslash works for regex but maybe not for others. 
> Combinations and nonces just introduce new noise when our original goal was 
> to reduce noise and complexity.
> 
> 
> 
> 
> Other languages avoid delimiter escape sequences by doubling up. Example, 
> "abc""def" -> abc"def. This concept is unfamiliar to Java developers, why 
> change now. Escape sequences are what we know.
> 
> 
> 
> 
> Language designers got very nervous when I suggested infinite delimiter 
> sequences in the original proposal; lexically sacrilegious. I felt strongly 
> that it was easy to explain and only 1 in 1M developers would ever use more 
> than 4-5 character delimiter sequences. In round two, I have come to agree. 
> This was taking on more complexity than is really warranted, for a use case 
> that doesn’t come along very often. I suggest we only need single and triple 
> double quotes. A single double quote works today, so no argument there. 
> Double double quotes means empty string, no problem. Triple double quotes are 
> only necessary to avoid having to escape quotes in alien source.
> 
> String json = """
>                 {
>                   "name": "Jean Smith",
>                   "age": 32,
>                   "location": "San Jose"
>                 }
>               """;
> 
> versus
> 
> String json = "
>                 {
>                   \"name\": \"Jean Smith\",
>                   \"age\": 32,
>                   \"location\": \"San Jose\"
>                 }
>               ";
> 
> This second case is where we wandered off the tracks with raw-ness. We 
> assumed raw-ness is necessary to avoid all the backslashes. Most cases can be 
> handled with triple double quotes.
> 
> Okay, so why not more combinations? Simply because, most of the time they are 
> not needed. On the rare occasion we do have nested triple double quotes, we 
> can then use escape sequences.
> 
> String nestedJSON = """
>                         \"\"\"
>                         {
>                           "name": "Jean Smith",
>                           "age": 32,
>                           "location": "San Jose"
>                         }
>                         \"\"\";
>                     """;
> 
> or better yet, you only have to escape every third double quote
> 
> String nestedJSON = """
>                         \"""
>                         {
>                           "name": "Jean Smith",
>                           "age": 32,
>                           "location": "San Jose"
>                         }
>                         \""";
>                     """;
> 
> Not so evil and it's familiar.
> 
> 
> 
> 
> Meaning, you can only use single quotes for simple strings and triple quotes 
> for multi-line strings. I don't have a strong opinion other than it seems 
> like an unneeded restriction. The only argument I've heard has been for 
> better error recovery when missing a close delimiter during parsing. My 
> counter for that argument is that if you are processing multi-line strings 
> then you can easily track the first newline after the opening delimiter and 
> recover from there. I implemented that recovery in javac and worked out well.
> 
> 
> 
> 
> 
> Cooked (translated escape sequences) should be the default. Why should a 
> multi-line string be different than a simple string?  We have a solution for 
> embedding double quote. Single quotes don't require escaping. Tabs and 
> newlines can exist as is. Unicode characters can be either an escape sequence 
> or the unicode character. So the only problem case is backslash. I would 
> argue that the rare backslash can be escaped. If not, then the developer can 
> use the raw-ness solution.
> 
> 
> 
> 
> If we don't translate newlines, then source is not transferable across 
> platforms. That is, a source from one platform may not execute the same way 
> on another platform. Translating consistently guarantees execution 
> consistency. As a note, programming languages that didn't translate newlines 
> in multi-line string literals typically regretted it later (Python.)
> 
> 
> 
> 
> With the original Raw String Literal proposal, there was concern about 
> leading and trailing nested delimiters. If we default to cooked strings, then 
> we use can use \".
> 
> 
> 
> 
> These questions have been answered numerous times and fall into the realm of 
> library support. Same arguments as before, same outcome.
> 
> 
> To summarize the bold paths at this point;
>       - multi-line strings are an extension of traditional simple strings
>       - newlines in a string are no longer an error and the string can extend 
> across several lines
>       - error recovery can pick up at the first newline after the opening 
> delimiter 
>       - multi-line strings process escape sequences (including unicode) in 
> the same way as simple strings
>       - multiple double quotes are handled with escape sequences
>       - triple double quote delimiter is introduced to avoid escaping simple 
> double quote sequences
>       
> Generally, I think this is very much in the traditional Java spirit.
> 
> 
> Now, let's move on to the lesser but more interesting issue. As I stated 
> above, raw-ness is a multi-headed beast. Raw-ness involves the turning off 
> the translation of 
>       - escape sequences
>       - unicode escapes
>       - delimiter sequences
>       - escape sequence prefix (backslash)
>       - tabs and newlines (control characters in general)
> 
> Sometimes we need all of the translations, sometimes few and sometimes none. 
> In the multi-line discussion above, we see we don't need raw as much as we 
> might have expected. Maybe for occasional backslashes, as in regex and 
> Windows paths strings.
> 
> 
> 
> 
> 
> The original Raw String Literal proposal suggested that raw-ness was a 
> property of the whole string literal and thus we proposed an alternate 
> delimiter syntax just to emphasize that fact. If we accept the bold path of 
> multi-line discussion above, then alternate delimiter is out. This leaves 
> prefixing as the best option to bless a string literal with raw-ness. 
> 
> At this point, I would like to suggest an alternate, maybe progressive way to 
> think of raw-ness. Since the original proposal, I have been thinking of 
> raw-ness as a state of processing the literal. State is certainly obvious in 
> the scanner implementation, why not raise that to the language level? If it 
> is a state then we should be able to enter and leave that state in some way. 
> Escape sequences are an obvious way of transitioning translation in the 
> string. \- and \+ are available and not currently recognized as valid escape 
> sequences, why not \- and \+ to toggle escape processing?
> 
>     String a = "cooked \-raw\+ cooked";  // cooked raw cooked - a little odd 
> but not so much so
>     String b = "abc\-\\\\\+def";         // abc\\\\def - struggling
>     String c = "\-abc\\\\def";           // abc\\\\def - more readable as an 
> inner prefix
>     String d = "abc\-\-def\+\+ghi";      // abc\-def\+ghi - raw on "\-" is 
> "\" and "-", raw off "\+" is "\" and "+"
>     String e = """\-"abc"\+""";          // "abc" - \- and \+ act a no-ops of 
> sorts
> 
> Comparing property vs state:
>     
>     Runtime.getRuntime().exec(R""" "C:\Program Files\foo" bar""".strip());
>     Runtime.getRuntime().exec("""\-"C:\Program Files\foo" bar""");
>     
>     System.out.println("this".matches(R"\w\w\w\w"));
>     System.out.println("this".matches("\-\w\w\w\w"));
> 
>     String html = R"""
>                        <html>
>                            <body>
>                                <p>Hello World.</p>
>                            </body>
>                        </html>
>                   """.align();
>     String html = """\-
>                        <html>
>                            <body>
>                                <p>Hello World.</p>
>                            </body>
>                        </html>
>                   """.align();
> 
> 
>     String nested = """
>                         String EXAMPLE_TEST = "This is my small example "
>                                 + "string which I'm going to "
>                                 + "use for pattern matching.";
>                     """ +
>                     R"""
>                         System.out.println(EXAMPLE_TEST.replaceAll("\\s+", 
> "\t"));
>                     """;
>     String nested = """
>                         String EXAMPLE_TEST = "This is my small example "
>                                 + "string which I'm going to "
>                                 + "use for pattern matching.";
>                         \-
>                         System.out.println(EXAMPLE_TEST.replaceAll("\\s+", 
> "\t"));
>                         \+
>                     """;
> 
> Hopefully, this is a good starting point for discussion. As before, I'm 
> pragmatic about which direction we go, so feel free to comment.
> 
> Cheers,
> 
> -- Jim
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

Reply via email to