> On 24 Nov 2014, at 22:21, Sara Golemon <poll...@php.net> wrote:
> 
> On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds <a...@ajf.me> wrote:
>> Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape
>> 
> I'm okay with producing UTF-8 even though our strings are technically
> binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
> this is pretty reasonable.

On that note, it strikes me now that we assume an encoding anyway for all 
escape sequences. If I’m using EBCDIC or UTF-16, “\n” isn’t going to help me 
much!

> You may want to make it a requirement that strings containing \u
> escapes are denoted as:   u"blah blah"    We set aside this format
> back in the PHP6 days (note that b"blah" is equivalent to "blah" for
> binary strings).

I’d rather keep u"blah blah” for if/when we add actual Unicode strings. 

> On the BMP versus SMP issue of \uXXXX styles, we addressed this in
> PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six
> hexit codepoints.   e.g.    "\u1234" === "\U001234"   I'd rather
> follow this style than making \u special and different from hex and
> octal notations by using braces.

That is something I’d thought about. \U takes 8 hex digits in every other 
language which has it, though.

I suppose we could do this, it resolves the BMP issue, certainly. Still, I 
think the brace syntax has its advantages because it’s completely unambiguous 
and it means we only have one syntax for this, not two different ones (less 
mental overhead). Plus, it’s worth noting that \u would still be different from 
\ooo and \xXX anyway, as it’d be fixed-length while octal and hex aren’t.

--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to