On May 7, 2019, at 3:14 PM, Brian Goetz <brian.go...@oracle.com> wrote: > > >> TL;DR: Good framework; must also account for the >> rectangle extraction rule (RER). A unified escape >> sublanguage (ESL) is highly desirable, and I propose >> adding <\ > and <\ LT WS*> as escapes for space >> and for null string. The existing \ char is OK, and >> should be "fattened" as a separate feature. I note >> some issues with <\ u X X X X>. > > Agree in general with the desire to extend ESL with some whitespace > sequences, though I take some issues with the syntax on \<nl> and \<space>. > Some alternate ideas regarding \uxxxx. > > First, unicode escapes. Alex pointed out offline that we had worked our way > into a linear thinking trap (again). In the first round, because we were > focused on raw strings, we turned off \uxxxx processing in the body of a raw > string, which raised the question of “how do we turn it back on.” And also > that, while we use the same escape character for both, they occupy very > different places in the language; the ESL is purely about string literals, > whereas \uxxxx is purely a lexing concern.
I don't think that's the trap we are in. The trap is the Language Experts Designing User Model trap, where LE's say "we don't need to deal with \u because it's not the part of the JLS we are working on", and the user says, "they are all just escapes, right?" The reason it's a trap is we think the user will be happy to learn and apply the geeky-fine distinctions between the two superficially similar syntaxes. One good way out of this particular trap is to carefully restrict the allowed \uxxxx patterns in strings, so that the phase order becomes irrelevant, and then move those patterns forward in the phase order along with the other escapes. We can also do as you are recommending, and ignore the problem. The only difficulty there is occasionally having to ask the user to ignore the problem also, by saying things like "yes, that's an escape sequence but \u sequence break the rule you are trying to apply". Such as using "\0040" to escape a space. How frequent is "occasionally"? I don't know; if it's very infrequent then, yes, we can ignore this problem. It will give puzzler authors some extra scope for their hobby. > His recommendation, which (now that its been explained to me) I strongly > agree with, is: let’s not have this feature touch unicode processing at all. > Let’s just leave unicode processing as is, using \uxxxx, whether in code, > SLSLs, MLSLs, and any future “raw” SLs. The similarly between \n and \uxxxx > is purely coincidental. (That's why it's a LEDUM trap.) > And if we really want the characters "\u0000” in a string literal, well, we > know how to escape the \. > > Which brings us to \<eol> and \<space>. My main complaint here is that I am > really uncomfortable using \<space> for “literal space”, because at the end > of the line, one cannot differentiate between \<eol> and \<space> when > reading the code. Alternatives include \_, or \s, or \., or … many others. Personally, I'm fine with those. By analogy with \n I suppose \s will be unsurprising; I don't care about this corner of the bikeshed, though. I certainly agree that having more than one "\ whitespace" sequence creates visual ambiguities; that's a good catch. — John