Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-27 Thread Stanislav Malyshev
Hi! I'm not completely against it. It's just an incomplete solution. echo \u{1F602}; // won't output  if the output encoding is not UTF-8 You can always use iconv/recode to bring it to every encoding you need (provided it supports full unicode range). I see this as a readability feature -

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Dmitry Stogov
May be I misunderstood something, but why to introduce unicode escapes if PHP engine doesn't support Unicode. Always converting such escapes into UTF-8 encoding, doesn't make any sense for people who use other encodings for output, databases, etc. Thanks. Dmitry. On Tue, Nov 25, 2014 at 1:09

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Markus Fischer
On 24.11.14 23:09, Andrea Faulds wrote: Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I think the choice of \u{xx} is interesting, i.e. using '{' and '}'. Afaik, one of the current best practices is to use json_decode(), like so: $ cat test.php ?php var_dump(

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
On 25 Nov 2014, at 08:33, Dmitry Stogov dmi...@zend.com wrote: May be I misunderstood something, but why to introduce unicode escapes if PHP engine doesn't support Unicode. We don't have Unicode strings which are made of codepoints rather than bytes, sure. But we do usually treat these

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
On 25 Nov 2014, at 08:33, Markus Fischer mar...@fischer.name wrote: On 24.11.14 23:09, Andrea Faulds wrote: Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I think the choice of \u{xx} is interesting, i.e. using '{' and '}'. Afaik, one of the current best

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Derick Rethans
On Mon, 24 Nov 2014, Sara Golemon wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding,

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Dmitry Stogov
On Tue, Nov 25, 2014 at 1:00 PM, Andrea Faulds a...@ajf.me wrote: On 25 Nov 2014, at 08:33, Dmitry Stogov dmi...@zend.com wrote: May be I misunderstood something, but why to introduce unicode escapes if PHP engine doesn't support Unicode. We don't have Unicode strings which are made of

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
On 25 Nov 2014, at 10:32, Derick Rethans der...@php.net wrote: On Mon, 24 Nov 2014, Sara Golemon wrote: On the BMP versus SMP issue of \u styles, we addressed this in PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six hexit codepoints. e.g.\u1234 === \U001234

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
On 25 Nov 2014, at 10:41, Dmitry Stogov dmi...@zend.com wrote: u8string tells that the whole string is UTF-8 encoded. Your escape Unicode proposal assumes just UTF-8 codepoint, but the whole string encoding is still undefined. True. There’s an assumption there that you’re using a

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Alain Williams
On Tue, Nov 25, 2014 at 02:41:48PM +0400, Dmitry Stogov wrote: I'm not completely against it. It's just an incomplete solution. echo \u{1F602}; // won't output  if the output encoding is not UTF-8 echo Привет \u{1F602}; // won't output anything useful if script encoding is not UTF-8

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
On 25 Nov 2014, at 11:20, Alain Williams a...@phcomp.co.uk wrote: I think that we need to clarify what we are talking about. What Andrea has proposed is a way of writing string constants. These characters in these strings will still be 8 bits big, this means that there needs to be some

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Christoph Becker
Ivan Enderlin @ Hoa wrote: Le 24/11/2014 23:09, Andrea Faulds a écrit : Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape It has a rationale section explaining why certain decisions were made, that I’d recommend you read in full. Excellent RFC, thank you for this

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Alain Williams
On Tue, Nov 25, 2014 at 11:25:17AM +, Andrea Faulds wrote: Well, we *do* already have a compile-time system for declaring encoding, the declare() construct. I missed that. Reading the documentation I confess that I do not really understand what the effect of declare(encoding=xxx) does.

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Derick Rethans
On Tue, 25 Nov 2014, Dmitry Stogov wrote: On Tue, Nov 25, 2014 at 1:00 PM, Andrea Faulds a...@ajf.me wrote: On 25 Nov 2014, at 08:33, Dmitry Stogov dmi...@zend.com wrote: May be I misunderstood something, but why to introduce unicode escapes if PHP engine doesn't support Unicode.

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Yasuo Ohgaki
Hi all, On Tue, Nov 25, 2014 at 8:09 PM, Andrea Faulds a...@ajf.me wrote: non-BMP code points are more important than ever. Yes, it is! We(Japanese) have number of them already. \u{code point} has huge advantage. We do not have care if code point value is BMP or not. i.e. We can do echo

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Andrea Faulds
On 25 Nov 2014, at 11:48, Derick Rethans der...@php.net wrote: I think incomplete nails it on the head. Without proper Unicode support in the parser, compiler and string function semantics, having these escape codes doesn't really do a lot for us. How so? Why are they less useful because

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Dmitry Stogov
On Tue, Nov 25, 2014 at 2:18 PM, Andrea Faulds a...@ajf.me wrote: On 25 Nov 2014, at 10:41, Dmitry Stogov dmi...@zend.com wrote: u8string tells that the whole string is UTF-8 encoded. Your escape Unicode proposal assumes just UTF-8 codepoint, but the whole string encoding is still

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-25 Thread Sara Golemon
On Tue, Nov 25, 2014 at 3:20 AM, Alain Williams a...@phcomp.co.uk wrote: If we decide to support non-utf-8 encoding at compile time then we could extend the syntax a bit to allow the encoding to be specified, eg: \U{utf-8: arabic letter alef} \U{iso-8859-6: arabic letter alef}

[PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape It has a rationale section explaining why certain decisions were made, that I’d recommend you read in full. Thanks! -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
On 24 Nov 2014, at 22:09, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape My apologies to you all, a small correction: The title of that email should’ve been “[RFC] Unicode Codepoint Escape Syntax” to match the title of the RFC, I missed out the

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. You

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
On 24 Nov 2014, at 22:21, Sara Golemon poll...@php.net wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Adam Harvey
On 24 November 2014 at 14:21, Sara Golemon poll...@php.net wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
On 24 Nov 2014, at 22:30, Adam Harvey ahar...@php.net wrote: On 24 November 2014 at 14:21, Sara Golemon poll...@php.net wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Adam Harvey
On 24 November 2014 at 14:35, Andrea Faulds a...@ajf.me wrote: On 24 Nov 2014, at 22:30, Adam Harvey ahar...@php.net wrote: I'm also OK with this, although I do wonder if we should be respecting the user's default_charset setting instead. (Since default_charset defaults to UTF-8, in practice

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
We would have to require ICU, but that might be worthwhile for PHP 7 anyway. Having at least one i18n API that's guaranteed to be available would be nice. It's 2014. I think requiring ICU is reasonable at this point. Orthogonal to this RFC, but I'd be in favor of deprecating all the non-ICU

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
On 24 Nov 2014, at 23:19, Sara Golemon poll...@php.net wrote: We would have to require ICU, but that might be worthwhile for PHP 7 anyway. Having at least one i18n API that's guaranteed to be available would be nice. It's 2014. I think requiring ICU is reasonable at this point. I also

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Alain Williams
On Mon, Nov 24, 2014 at 02:21:37PM -0800, Sara Golemon wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
On 24 Nov 2014, at 23:29, Alain Williams a...@phcomp.co.uk wrote: There is a big difference with \u or \U and \x or \o and that is the number of characters that follow the escape. \x has 2, \o has 3 - both are short and easy to count with the eye. \U012345 is quite long and it is not so

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Alain Williams
On Mon, Nov 24, 2014 at 11:36:28PM +, Andrea Faulds wrote: On 24 Nov 2014, at 23:29, Alain Williams a...@phcomp.co.uk wrote: echo \U{arabic letter alef}\n”; Ooh, that’s an interesting idea. I believe Perl actually has this already, although it uses the \N syntax:

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I've linked a provisional HHVM implementation from that page. Planning to match whatever PHP7 does, of course, but for the moment I've added named entity support since it's

Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Ivan Enderlin @ Hoa
Le 24/11/2014 23:09, Andrea Faulds a écrit : Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape It has a rationale section explaining why certain decisions were made, that I’d recommend you read in full. Excellent RFC, thank you for this proposal. I would suggest this