On May 1, 2018, at 2:19 PM, Guy Steele <guy.ste...@oracle.com> wrote: > > the convention that if the last line consists entirely of whitespace and does > not > end in a newline, then it should be stripped _and furthermore the exact same > amount of whitespace should be stripped from all other lines in the literal_
Seconded. (And see discussion of case y in my earlier note to amber-dev, where the final line is the control line. Relevant part copied below.) I keep coming back to the idea that the final line of the quote is the best place to control indentation stripping. Here's a rule we could make: If the trailing line of the literal is blank (except for indentation) then it is treated as part of the payload delimiter. In that case, that whitespace must be uniformly present as leading indentation on all other lines, which is also stripped from every line of the quote body. The leading newline (if any) is also stripped. String y = ` ..___line one ..line fifty-two ..___line ninety-nine ::`; The payload starts with "___line one" and ends with "___line ninety-nine". (Here underbar _ is a non-stripped space and colon : is the controlling stripped space, while period are the non-controlling copies of the stripped space.) By declaring that the final line gets stripped, the literal's payload is fully and exactly contained in the displayed source code lines between the two stripped lines of the literal. That is not possible unless the last line is stripped as well as the first. More: *We can specify that it is an error if the identical leading whitespace is not present on every payload line and also the stripped trailing line.* This means there are no invisible surprises: What you see at the end of the string is the same as everywhere else. Previous versions of the stripping rule distribute the responsibility across all the lines, but make it difficult or impossible to find the line with the shortest indent, since (a) it might be in the middle of the literal, or (b) it might even display the same as other lines with a different combination of spaces and tabs. By contrast, making the trailing line uniquely responsible for controlling the stripping removes situation (a) and requiring other lines to have the same leading space substantially removes situation (b). String y_err = ` ..___line one _line fifty-two ..___line ninety-nine ::`; // error: unaligned indent before "line fifty-two" I think we should do this. It would make it a little (a *very* little) harder to correctly write indented What if the trailing line has non-space characters? Fine; don't exdent that literal (that's option A). Or (option B) exdent by all the leading whitespace characters, and tack on the remaining part of the final line to the payload; that gives a hook for ending a multi-line literal without a newline but keeping the indent feature. (We have to favor one and disfavor the other, among the two options of trailing newline and non-trailing newline.) There's a third choice (option E) to reserve that condition for future use. FWIW I like option A as the simplest back-off from fancy stuff: String y_A = ` _____line one __line fifty-two _____line ninety-nine __line 100`; // => really raw no indent stripping Rationale: The trailing line controls exdenting, but *only if* it is all whitespace: All the exdent and nothing else. What about the leading line? Should it have its indent stripped? No, because that doesn't help make clean indented rectangles of source code; stripping that space would be pure puzzler with no upside. In fact, any non-empty first line is *not* going to align with the rest of the lines, if indentation is in play. Therefore, we have similar options as dealing with non-blanks in the trailing line: Option A1 is to turn off exdenting altogether if the first line is non-empty (that's S. Colebourne's proposal too I think). Option B1 is keep the first line as-is, even though it won't align with the rest of the rectangle, and exdent the rest. Option E1 is to disallow a non-empty leading line. For completeness, option E1A is to disallow a leading line *if it begins with whitespace*, but if it begins with non-whitespace turn off exdenting. (Note that under these rules if the trailing line begins with non-whitespace exdenting is also turned off.) I think B1 is bad: It breaks up the rectangle. I'd like to say that we don't ever break up rectangles; if a proper text rectangle can't be formed in the source code, then exdenting is turned off. No partial exdents. I guess A1 is consistent with the previous A, but so is E1A. String y_E1A = `__spacey ..___line one ..line fifty-two ..___line ninety-nine ::line 100`; // => error: unaligned indent before "spacey" Bottom line suggestions: 1. Control indent/exdent string by defining it precisely as the *trailing* line. 2. Omit that trailing line (if it is all-blank), because it is pure control, not payload. 3. If the trailing line has non-blanks, it's not indent control so don't strip or omit anything. (B: Or split such a trailing line into leading blanks for indent control and payload.) 4. If stripping, require that *every* payload line without exception have the same prefix. 5. If the leading line is not empty, don't strip anything. (Rectangles wouldn't align anyway.) 6. Conversely, if stripping, omit the leading line: It can't contribute anything to a rectangle. 7. Make some edge conditions errors (as in 4) and others "do not strip" cases (as in 3, 5). Net model is you are either raw-means-raw or raw-is-a-rectangle. The latter mode is the only way lines are omitted or left-indents are stripped. To get into the latter mode, you have to have a well-formed rectangle with no oddities. If there's an oddity, you get an error (if it would be hard to read) or you back off to raw-means-raw. A multi-line string that doesn't have a leading newline is raw-means-raw, no exceptions. One downside to putting all the weight on the trailing line: You don't get all of Kevin's style choices. You have to indent the trailing quote the amount you expect to have stripped. But on balance this is IMO a feature not a bug: The exdent level is defined in one unique place. — John P.S. And for the record, here's my errant message to amber-dev: From: John Rose <john.r.r...@oracle.com> Subject: Re: Raw String Literals (RSL) - indent stripping of multi-line strings Date: April 23, 2018 at 12:20:02 PM PDT To: Jim Laskey <james.las...@oracle.com> Cc: amber-dev <amber-...@openjdk.java.net> > > - Should trailing whitespace be stripped? As with the "all-indented" case above, trailing space should be stripped only if there is a way to opt out of stripping. I think the trimMarkers API is the way to cover this use case, since it is rather specialized. > - Should the first or last line be removed if blank? Yes. In essence, the syntax of a quote sequence includes a line terminator. This BTW allows non-periodic quote sequences, which as a corollary allows leading and trailing quote sequences to be encoded in the RSL: var hasLeadingAndTrailingTick = `` `I went for a walk in the tall brush and picked up some riders.` ``; Also, the removal of leading and trailing blank lines gives users some degrees of stylistic freedom that seem to be customary, along with the indent-stripping. Here's a new point along these lines, if I may be so bold: If we are sticking in non-payload stylistic inputs into RSLs, we should consider opening up a reservation for future use, in the form of RSL configurations which are declared to be illegal. We could declare that some obviously pathological subset of near-misses to an indent-stripped RSLs is illegal, and reserved for future extension. On the other hand, we are trying very hard to accept every RSL the user could randomly type in, which is incompatible with reserving a set of constructs for future use. This isn't logically necessary in the style-control use cases; we can simply declare that some style-control is just illegal, if we think there's a chance of using that coding space in the future. By obviously pathological I mean something like one or all of these: String x = `_ ..line one ..line two ..`; String y = ` ..___line one ..line fifty-two ..___line ninety-nine ..`; String z = ` ..line one ..line two ..___`; (Here underbar _ is a non-stripped space.) In case x, the is a whitespace on the non-determining blank first line. Surprisingly, this space doesn't get stripped (under the proposed rules). In case y, line 52 determines the indent to strip, and this is true even if it is buried in the middle of 100 lines. Luckily, in this case, the determining last line (just before the close-quote) ratifies this choice, so there is a unique place to look for the stripped indent, without searching the whole string. In case z, the stripped last line, while a determining line, has extra whitespace. This is easy to miss. I suggest placing a structural constraint on stripped indents, that the last line, if blank, is stripped, and if stripped, must be of length exactly zero after the leading indent is trimmed. That would rule out z and ameliorate y. I also suggest ruling out x by requiring that the first line, which is non-determining, must not have leading whitespace at all. This doesn't break any of your examples a-f. Removing cases x and z might remove a class of puzzler about the significance of leading white spaces near the ends of RSLs. (Can anyone see a positive use case for them that can't be easily adjusted to a less pathological form?) And (getting back to extensions) ruling out x also gives us a tidy little subspace of RSLs to reserve for future use. In other words, an RSL with multiple lines whose leading line begins with a space can be defined, in future iterations of this feature, to include envelope information about the RLS, after that space. Something like this: String q = `_{cool RSL header invented by our successors} ..line one ..line two ..`; This envelope information would *not* be included in the payload, but would be stripped as if the leading line were purely blank. It would somehow control the processing of the RSL payload, and/or the parsing of the rest of the RSL. So in this future feature, the first line would still not be a determining line, and would be stripped completely, and the stuff between braces would be used in some way we can't define at present. I suppose it could have to do with processing embedded escapes. String r = `_{cool RSL header invented by our successors} ..line one ..line two {cool embedded stuff enabled by RSL header} ..`; But there's no way to say at this moment what such a future syntax would look like, and that's my point: For now we can reserve a corner of the RSL encoding space for futures. We might never exercise the option, but it seems wise to buy the option, if it can be bought cheaply as a side effect of restrictions on pathological indent management. I didn't raise this earlier though it was on my mind, but as you see the complexity trade-offs change with built-in indent stripping. And, obviously, there are other ways to extend RSLs in the future which may seem better, such as by adding prefixes before the string quote. If we don't put constraints on cases x and z above, we still have options for future extension. Conversely, even if we are sure we want to make other choices regarding futures, I think it is a safe move to exclude x and z above. — John