Kevin & Alan, Do you have numbers from your RSL survey for, of all string expressions that are candidates for translation to a multi-line string literal, what percentage contain no escapes other than quotes and newline?
Thank you, -- Jim > On Apr 10, 2019, at 12:22 PM, Jim Laskey <james.las...@oracle.com> wrote: > > Next plate is (1a) incidental whitespace. > > Having decided that we are content with "fat" delimiters (""") for multi-line > strings, we have some more choices to make regarding multi-line strings. > (We're not going to talk about "raw" strings yet; let's finish the multi-line > course first.) > > Multi-line strings are different from single-line strings in a number of > ways, so let's get clear on what we want "multi-line" to mean. > > Line terminators: When strings span lines, they do so using the line > terminators present in the source file, which may vary depending on what > operating system the file was authored. Should this be an aspect of > multi-line-ness, or should we normalize these to a standard line terminator? > It seems a little weird to treat string literals quite so literally; the > choice of line terminator is surely an incidental one. I think we're all > comfortable saying "these should be normalized", but its worth bringing this > up because it is merely one way in which incidental artifacts of how the > string is embedded in the source program force us to interpret what the user > meant. Which brings us to the next incidental aspect... > > Whitespace: A multi-line string is nestled in the context of a Java source > program. It is likely (though not guaranteed) that the indentation of lines > has been distorted by the desire to make the embedded snippet align with the > enclosing lines. Most of the time, there is some combination of incidental > whitespace and intended whitespace. There are a number of algorithms by > which we could try to intuit which the user intended. Which brings us to ask: > > - Assuming the existence of a reasonable algorithm for re-aligning text, > what should the _default_ be for the language? Should it assume the user > wants re-alignment, or make the user explicitly opt in? > - If the choice is "automatically align", how would we indicate the desire > to opt out? > - Should we limit what we do automatically to only what can be done by an > equivalent library routine? > > (Again, let's focus on the requirements and semantics and defaults first, > before we bikeshed the syntax.) > > Its hard to answer the above without a clear understanding of the use cases. > So, here's a partial catalog of examples; let's play "what was the user > thinking", and see if we can agree on that. > > Examples; > > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four spaces? > > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner > knowledge? > > String i = """ "nested" """; // strip leading/trailing space? > > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line > strings? > > String k = """ > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """.format(name); // is this the answer to multi-line string > expressions? > > As we can see, there were a lot of cases where the user _probably_ wanted one > thing, but _might have_ wanted another. What control knobs do we have, that > we could assign meaning to, that would let the user choose either way? > Candidates include: > > - The opening line (is it blanks followed by a newline, or are there > non-whitespace characters?) > - The position of the close delimiter (is it on its own line, or not?) > > Similarly, we have a number of policy choices: > > - Do we allow content on the same lines as the delimiters? > - Should we always add a final newline? > - Should we strip blanks lines? Only on the first and last? All leading > and trailing? > - How do we interpret auto-alignment on single-line strings? Strip? > - Should we right strip lines? > > And some syntax choices (not to be discussed now): > > - How do we indicate opt-out? > > Comments? > > > Examples narrative. Don’t peek yet. Stop and comment first. > > > Unlike most other Java constructs, multi-line strings force us to look at > coding style "square on". Keep in mind that we are often guilty of making > assumptions about developer coding style. For instance, we may assume that > multi-line strings tend to be large elements. We may also assume that > developers will declare static final String variables to keep multi-line > strings from messing up their code. All very neat and tidy, but... we know > from experience that developers will use multi-line strings everywhere, as > they have with array initialization and large lambda bodies. > > From this, we recommend that multi-line string fat delimiters should follow > the brace pattern used in array initialization, lambdas and other Java > constructs. The open delimiter should end the current line. Content follows > on separate lines, indented one level. The close delimiter starts a new > line, back indented one level, followed by the continuation of enclosing > expression. > > So as in this brace pattern; > > int[] ia = new int[] { > 1, > 2, > 3 > }; > > we have the fat delimiter pattern; > > String d = """ > +--------+ > | text | > +--------+ > """; > > and; > > String.format(""" > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """, name); > > The fat delimiter pattern also significantly helps with future editing in and > around the multi-line string. For example, changing the length of the > variable name in the above "String d =" example doesn't affect the > positioning of the string content or the close delimiter. > > If we adopt this style, some of the answers to the incidentals questions > become easier or even moot. Other styles are still valid, but the result of > automatic incidental handling may be surprising. > > Note that fat delimiters can be used on single lines. What are the semantics > for auto-alignment in that case? The question of stripping whitespace and > newlines is not really about alignment. It's about what are the rules for > handling incidental characters in a fat delimiter string. > > > Continuing with the examples, let's assume some (negotiable) auto-alignment > basic rules; > > 1. All content lines are uniformly right stripped. Whitespace at the end of > lines is not something that is consistently managed by IDEs/editors. > 2. End of lines are always translated to \n. > 3. If the content after the open delimiter is empty then the first end of > line is discarded. > 4. Content is left justified while preserving relative indentation. > > And as a reminder, in the last round we introduced or attempted to introduce > the following String methods; > > - String::indent(n) - used to change indentation, line by line (in JDK 11) > - String::align() and String::align(n) - used to manage incidental > indentation (didn't make it) > - String::format as an instance method (resolution issues YTBD) > > __________________________________________________________________________________________________ > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > The problem with this example is that it is not following the fat delimiter > pattern. Let's change the variable name "a" to "something". > > String something = """ > .......... +--------+ > .......... | text | > .......... +--------+ > .......... """; // first characters in first column? > > The "." indicate all the places where we had to add whitespace to maintain > the pattern used. > __________________________________________________________________________________________________ > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Same maintenence problem as example (a). > > Still works, but the question here is, do we give meaning to indentation > relative to the close delimiter? Did we want?; > > +--------+\n > | text |\n > +--------+\n > > It's a nice trick but we sabotage the fat delimiter pattern. We would always > get at least one level of indentation, whether we wanted it or not. Maybe > better to code as; > > String b = """ > +--------+ > | text | > +--------+ > """.indent(4); > > So the question here is: should it be possible to specify "extra" indentation > through the positioning of quotes, or are we better off saying that any extra > indentation should be done through library calls? Also noting that the > library calls might be subject to compile time folding. > __________________________________________________________________________________________________ > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > The amount of indentation is not a problem, just an aesthetic issue. > > __________________________________________________________________________________________________ > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Text book fat delimiter pattern. > __________________________________________________________________________________________________ > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Just an aesthetic issue. > __________________________________________________________________________________________________ > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > As-is would generate; > \n > \n > +--------+\n > | text |\n > +--------+\n > \n > \n > \n > > If we stripped away all leading or trailing blank lines, we would then have > code as; > > String f = "\n".repeat(2) + """ > +--------+ > | text | > +--------+ > """ + "\n".repeat(2); > __________________________________________________________________________________________________ > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > RESULT: > +--------+\n > | text |\n > +--------+ > > This one is likely okay. It's not the fat delimiter pattern, but the oddity > makes it clear we mean something different; we want to drop the last \n. > __________________________________________________________________________________________________ > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner > knowledge? > > RESULT: > +--------+\n > | text |\n > +--------+ > > We can do this because the compiler's scanner can determine the indentation > on the open delimiter line. However, this one is problematic if we require a > String method to duplicate the compiler's algorithm (String::align). Tool > vendors may also find this one problematic. > __________________________________________________________________________________________________ > String i = """ "nested" """; // strip leading/trailing space? > > RESULT: > "nested" > > This one still follows the rules; left and right stripped. > __________________________________________________________________________________________________ > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line > strings? > > Mid-string substitution gets messy fast. Let's break the example down to the > following (without align.) > > String j = """ > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """; > > This is the same as > > String j = > """ > public static void """ > + name + > """(String... args) { > System.out.println(String.join(args)); > } > """; > > Which works fine if we say no \n when close delimiter is on the same line. > The other requirement is there is that each multi-line string componment ends > up with a common indentation. The odds of that happening are poor. > > Guess we're stuck with parentheses String::align. Unless... > __________________________________________________________________________________________________ > String k = """ > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """.format(name); // is this the answer to multi-line string > expressions? > > RESULT: > public static void methodName(String... args) { > System.out.println(String.join(args)); > } > > Maybe a better substitution solution. > __________________________________________________________________________________________________ >