Re: [PHP-DEV] [VOTE][RFC] Unicode Codepoint Escape Syntax

Andrea Faulds Mon, 08 Dec 2014 18:45:07 -0800

Hi!

> On 9 Dec 2014, at 02:14, [email protected] wrote:
> 
> 2014-12-09 0:51 GMT+01:00 Andrea Faulds <[email protected]>:
>> 
>> https://wiki.php.net/rfc/unicode_escape
> 
> 
> Still leaves unmentioned that there was already an established Unicode
> escape syntax. PCRE provides \x{1F520} for codepoints in conjunction to
> plain \xFF for byte escapes.


Interesting, I was unaware of that until now, thanks for pointing this out.

> Maybe there should be more elaboration on why PHP itself should go with
> the \u{xxxx} ECMAScript representaton, thus introducing a syntax disparity
> with our most major string handling extension.

Well, PCRE does what it does probably because of its name: *Perl-Compatible* 
Regular Expressions. Perl has the \x syntax. But PCRE’s syntax comes from what 
suits Perl, not PHP, so I don’t see why we should necessarily match its 
behaviour. If we add \x{xxxxx} syntax to PHP’s string literals, then we’ll 
break existing code which uses double quoted strings for regular expressions.

I think \x{xxxx} is misleading anyway - \xXX is always single-byte/character, 
yet Unicode code points can’t be represented in PHP strings as single bytes 
when encoded in UTF-8 (unless they’re below U+0100, of course). If I saw 
"\x{abcd}” I'd expect it to be the same as "\xab\xbc”. Plus, while Perl has 
\x{xxxx} syntax, Ruby and ECMAScript 6 have the \u{xxxx} syntax, so \u{xxxx} is 
already more popular. The ‘u’ in \u{xxxx} also makes it more obviously 
“Unicode”.

Thanks!
--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE][RFC] Unicode Codepoint Escape Syntax

Reply via email to