On Mar 13, 2019, at 11:56 AM, Kevin Bourrillion <kev...@google.com> wrote: > > - Multi-line-ness and raw-ness are orthogonal concepts. > > Is that true, as stated? I would have said that any support for rawness > automatically gives you support for multi-line-ness by nature, because a > newline character becomes literal. That doesn't seem like orthogonality.
True orthogonality means that the two vectors have a cosine of zero. That's a stronger condition than independence, which is a cosine of less than one. There are at least two interesting factors that pull the cosine between "raw" and "multiline" away from zero. One is that raw implies multiline, as you just pointed out, Kevin. A second goes the other way: Multiline asks for raw, because of its scaling properties. Classic escapes are *less appropriate* to multiline strings than classic single-line strings. This point is touched on here: >> - For multi-line strings, a stronger delimiter (e.g., """) seems to be >> preferred on readability grounds, because people don't want to have to >> squint to see where the embedded code ends and the Java code resumes. >> > Valid point. Today, every line or group of lines in a .java source file is > Java code, but now there will be sections where that's not at all clearly the > case. Making the boundaries clear between the two types of code seems like a > good practice. The old proposal allowed a single backtick to offset these > sections in 99% of cases, but it occurred to me that developers would often > be better off using more of them just to delineate better… But I think the point is a little stronger. We can expect that normal code has visually limited line lengths, but visually unlimited line counts. Even if we believe that well-behaved multi-line strings will fit in a single screenful, it is the case that the scale of a single-line string is the scale of a single screen line, while the scale of a multi-line string is a *whole screen*. It is a *questionable assumption* that escape sequence notations will work just as well at the larger scale as the known-good smaller scale. And we question that assumption when we speak of "squinting" as above. Let's be clear about this: Squinting through a page of code for escapes is at least N times harder than squinting through a line of code, where N is the page size. Raw strings given a clear and plausible answer to this problem posed by multi-line strings, hence my conclusion that they are (for this reason among others) not fully orthogonal features. The answer is, "we won't put any escape sequences into the bulk, we will only put them at the boundary". Boundaries are *always* (barring fractals) smaller than bulks. Another part of the answer, which has been derived again in a previous message, is "we'll put a big-enough escape sequence at the boundary so you'll have a fighting chance to see it in the bulk". I think that's the real reason why, after inspecting single-" as a multi-line delimiter, we always discard it in favor of something more distinctive, with multiple characters. The clever discoveries of payloads which introduce the short closing quote are interesting puzzles, but they are just special cases of the general rule that, if you are going to spray a large bulk of string payload on the screen, you are going to need a larger unit of visual information to make a clearly evident ending fence for it. That more general rule does not appeal to dubious assertions like "this will only be for SQL and five more notations, we promise". Especially if we (later?) allow the ending fence to grow as large and robust as each use case requires. (That's my argument for "strong quotes" in all sizes, of course.) I guess where this ends for me is that, not buying the orthogonality argument, I more easily see raw as a better first course, because it picks up most of the multi-line use cases, and also the case of single-line regular expressions. — John