On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds <a...@ajf.me> wrote: > Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape > I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable.
You may want to make it a requirement that strings containing \u escapes are denoted as: u"blah blah" We set aside this format back in the PHP6 days (note that b"blah" is equivalent to "blah" for binary strings). On the BMP versus SMP issue of \uXXXX styles, we addressed this in PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six hexit codepoints. e.g. "\u1234" === "\U001234" I'd rather follow this style than making \u special and different from hex and octal notations by using braces. -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php