> On 24 Nov 2014, at 23:29, Alain Williams <a...@phcomp.co.uk> wrote: > > There is a big difference with \u or \U and \x or \o and that is the number of > characters that follow the escape. \x has 2, \o has 3 - both are short and > easy > to count with the eye. \U012345 is quite long and it is not so visually > obvious > where it should end. > > Ergo: I prefer Andrea's "\u{0123}" as it is going to be more robust against > typos.
Typos are an angle I hadn’t quite considered, but yes, this syntax is better against that. Importantly, it’s a compile error if you produce a broken literal, while if you screwed up the brace-free style you’d probably just get a mangled string. > One other thing that we could do is to allow code points to be named, with \U > (capital 'U') eg: > > echo "\U{arabic letter alef}\n”; Ooh, that’s an interesting idea. I believe Perl actually has this already, although it uses the \N syntax: http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES Is something like that what you have in mind? > If you think that it is a bad idea, please update the RFC to say why this is a > bad idea and so why it is not going to happen - for now. > > It would be nice since a code point is just a big number without any really > obvious > meaning, but a name makes for greater clarity. > > However: I suspect that interpretting this might be considerably slower which > means slower compilation. I’ll add it to the Future Scope part. One issue with this, however, is that we’d have to include a Unicode info database from somewhere with the names of the characters. That’d probably mean requiring ICU or something like it, which the current patch doesn’t do. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php