Re: Raw string literals and Unicode escapes

Remi Forax Tue, 27 Feb 2018 13:48:12 -0800

> De: "Guy Steele" <guy.ste...@oracle.com>
> À: "John Rose" <john.r.r...@oracle.com>
> Cc: "amber-spec-experts" <amber-spec-experts@openjdk.java.net>
> Envoyé: Mardi 27 Février 2018 22:12:14
> Objet: Re: Raw string literals and Unicode escapes


>> On Feb 27, 2018, at 4:20 PM, John Rose < [ mailto:john.r.r...@oracle.com |
>> john.r.r...@oracle.com ] > wrote:

>> On Feb 27, 2018, at 11:48 AM, Brian Goetz < [ mailto:brian.go...@oracle.com |
>> brian.go...@oracle.com ] > wrote:

>>>> So after this length instead of having the probability to see a character 
>>>> to be
>>>> virtually 1, you have the opposite effect, because programming languages (a
>>>> human construct) are very regular in the set of chars they use. So you do 
>>>> not
>>>> need to a repetition of a character to avoid a statistical effect that 
>>>> does not
>>>> occur. Being able to choose the escape character, is enough.

>>> The problem is not that it's enough, its that it is too much. Having nine 
>>> ways
>>> to say the same thing is too many; having infinitely many (e.g., nonces) is
>>> worse. Having used the "pick your delimiter" approach taken by Perl, I find
>>> that you are *still* often bitten by the inability to find a good delimiter 
>>> for
>>> embedding a snippet of a program written in a language similar to the outer
>>> language. And it surely makes code less readable, because many more things 
>>> can
>>> be interpreted as quotes.

>> My experience tracks with Brian's. That's why I think the random string
>> model is more robust than some vague hope that languages won't overlap.

>> Yes, random strings are an outlier, but less so that you'd think. A typical
>> compression ratio for code is 5x, which means that if you replace "random
>> string of length 10" with "random code snippet of length 50" you get the
>> same analytic results. In order to exclude a close-quote, you need an
>> additional constraint, which in practical terms results in folks having to
>> grub around inside their raw strings looking for accidentall quotes.

> Which leads us to the following theoretical result: the ```` mechanism does 
> not
> require you to grub around in the interior of the string AT ALL if you don’t
> want to. All you need to know is the length. If the length of the raw string 
> is
> n, and it does not begin or end with ` (a necessary check in any case), then
> using n-1 backquote characters before and after will always do the job.

> In practice, many programmers (and programs) will be willing to do a quick
> search to see whether “```” or failing that “````” happens to be absent from
> the raw string. :-)

Ok, i'm clearly in minority here, the repetition pattern wins. 

Rémi

Re: Raw string literals and Unicode escapes

Reply via email to