Re: [PHP-DEV] [RFC] Unicode Escape Syntax

Andrea Faulds Mon, 24 Nov 2014 15:37:06 -0800

> On 24 Nov 2014, at 23:29, Alain Williams <a...@phcomp.co.uk> wrote:
> 
> There is a big difference with \u or \U and \x or \o and that is the number of
> characters that follow the escape. \x has 2, \o has 3 - both are short and 
> easy
> to count with the eye. \U012345 is quite long and it is not so visually 
> obvious
> where it should end.
> 
> Ergo: I prefer Andrea's "\u{0123}" as it is going to be more robust against 
> typos.


Typos are an angle I hadn’t quite considered, but yes, this syntax is better 
against that. Importantly, it’s a compile error if you produce a broken 
literal, while if you screwed up the brace-free style you’d probably just get a 
mangled string.

> One other thing that we could do is to allow code points to be named, with \U
> (capital 'U') eg:
> 
> echo "\U{arabic letter alef}\n”;

Ooh, that’s an interesting idea. I believe Perl actually has this already, 
although it uses the \N syntax:

http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES

Is something like that what you have in mind?

> If you think that it is a bad idea, please update the RFC to say why this is a
> bad idea and so why it is not going to happen - for now.
> 
> It would be nice since a code point is just a big number without any really 
> obvious
> meaning, but a name makes for greater clarity.
> 
> However: I suspect that interpretting this might be considerably slower which
> means slower compilation.

I’ll add it to the Future Scope part.

One issue with this, however, is that we’d have to include a Unicode info 
database from somewhere with the names of the characters. That’d probably mean 
requiring ICU or something like it, which the current patch doesn’t do.
--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

Reply via email to