On Apr 26, 2019, at 8:59 AM, Kevin Bourrillion <kev...@google.com> wrote: > > On Fri, Apr 26, 2019 at 8:56 AM Kevin Bourrillion <kev...@google.com> wrote: > > Apparently bash's behavior is to replace <any amount of whitespace, > backslash, newline, any amount of whitespace> with a single space character, > and that at least seems like a useful behavior for us too if we're open to it.
No, it replaces <\ NL> with nothing at all. Any spaces before or after that two-character sequence are bystanders. In a separate step, if not inside quotes, all sequences of whitespace are treated as if they were single spaces, as the shell breaks a line up into words. The net is that the stuff you mentioned behaves like whitespace. But also: ``` $ x=a\ b $ echo $x ab ``` However, I'm proposing that horizontal whitespace *after* the newline is "gobbled up" and thrown away with the leading <\ LT>, so the escape sequence is more like <\ LT (SP|TAB)*>. This gives the programmer more control over program layout. > I was forgetting, when I said this, that another substantial minority use > case (I want to say at least 15%? These were rough estimates though) for > multi-line strings is really long URLs, checksums, etc., that aren't meant to > have any spaces in them at all. So the bash behavior is not necessarily what > we'd want, although of course consistency with it has some amount of value in > itself. The actual bash behavior, described above, *is* what we want. If the programmer *wants* a space, one can be placed just before the <\ LT> sequence. Luckily, that's reasonably readable. > Which raises another question: do we allow \<terminator> in SL strings? (I > presume so, and we just eat the \ and the terminator.) If we eat the (SP|TAB)* after LT, then we have given the programmer control over indentation, in a way that is consistent with the rectangle rule, but applies only to the one escaped (partial) line. > Hmm, I can see how that could be harmless but it seems to blur the boundary > between the features to me. It seems that way. I think what's happening is another iteration of "Let's do raw strings! Wait, that's not what they really are" and now we are at "Let's do multi-line strings!" Brian's comment is that the tri-quote makes a better container for payloads with single quotes. Those payloads often have multiple lines too. So it's really "fatter strings", in some sense. We might say we are making strings with *unescaped LTs*. The rectangle rule shows up as soon as we realize that programmers have strong opinions about spacing, and want to indent their code so it is readable. (Pretty too; beauty is a proxy for readability I suppose.) So if we let the programmer start putting paragraphs into string bodies, we also have to let the programmer manage indentation. And it's a short and natural step from exdenting to line-breaking, IMO. We might say we are making *more readable syntax for large strings*. Minimizing escape sequences makes them readable, and so does giving the programmer control over program layout. Such "readable strings" make some sense for one-liners also, especially if we extend the 2D rectangle rule to the 1D case and strip leading and trailing whitespace, near the triquotes. In the end, we might just dub them "fat strings". — John