Hi!
> On 9 Dec 2014, at 02:14, [email protected] wrote:
>
> 2014-12-09 0:51 GMT+01:00 Andrea Faulds <[email protected]>:
>>
>> https://wiki.php.net/rfc/unicode_escape
>
>
> Still leaves unmentioned that there was already an established Unicode
> escape syntax. PCRE provides \x{1F520} for codepoints in conjunction to
> plain \xFF for byte escapes.
Interesting, I was unaware of that until now, thanks for pointing this out.
> Maybe there should be more elaboration on why PHP itself should go with
> the \u{xxxx} ECMAScript representaton, thus introducing a syntax disparity
> with our most major string handling extension.
Well, PCRE does what it does probably because of its name: *Perl-Compatible*
Regular Expressions. Perl has the \x syntax. But PCRE’s syntax comes from what
suits Perl, not PHP, so I don’t see why we should necessarily match its
behaviour. If we add \x{xxxxx} syntax to PHP’s string literals, then we’ll
break existing code which uses double quoted strings for regular expressions.
I think \x{xxxx} is misleading anyway - \xXX is always single-byte/character,
yet Unicode code points can’t be represented in PHP strings as single bytes
when encoded in UTF-8 (unless they’re below U+0100, of course). If I saw
"\x{abcd}” I'd expect it to be the same as "\xab\xbc”. Plus, while Perl has
\x{xxxx} syntax, Ruby and ECMAScript 6 have the \u{xxxx} syntax, so \u{xxxx} is
already more popular. The ‘u’ in \u{xxxx} also makes it more obviously
“Unicode”.
Thanks!
--
Andrea Faulds
http://ajf.me/
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php