Re: [PHP-DEV] Zend Engine 3
2014-11-24 6:29 GMT+01:00 Xinchen Hui larue...@php.net: I don't understand why you rush for it. any work of you depends the number bumping? I don't see what makes it so different that we cannot do it now instead of later, not like it will be a game changer +1 for the change -- regards, Kalle Sommer Nielsen ka...@php.net -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Zend Engine 3
On Mon, Nov 24, 2014 at 10:24 AM, Kalle Sommer Nielsen ka...@php.net wrote: 2014-11-24 6:29 GMT+01:00 Xinchen Hui larue...@php.net: I don't understand why you rush for it. any work of you depends the number bumping? I don't see what makes it so different that we cannot do it now instead of later, not like it will be a game changer +1 for the change +1, and given how Dmitry already bumped the other api versions used by the ext developers (ZEND_MODULE_API_NO/ZEND_VERSION) I don't think that there is any reason for us to be afraid to change it. And bumping it early would actually give more time to the extension authors to accommodate their code. -- Ferenc Kovács @Tyr43l - http://tyrael.hu
Re: [PHP-DEV] [RFC] Default constructors
I like the last patch. I think ZEND_ACC_STATIC flag must not make any problems. However, I thought about one more inconsistent. Your patch works fine for parent:: methods but not for grandparents:: In the following code default constructor won't work. class A { } class B extends A { } class C extends B { function __constructor() { A::_constructor(); // this won't work } } It's not a big problem to fix implementation to support it, or may be support for parent:: is enough. Anyway, it should be reflected in RFC (if this code should work or should not). Thanks. Dmitry. On Fri, Nov 21, 2014 at 10:40 AM, Dmitry Stogov dmi...@zend.com wrote: thanks Stas. I'll think on next week. Dmitry. On Fri, Nov 21, 2014 at 6:59 AM, Stanislav Malyshev smalys...@gmail.com wrote: Hi! Additional check for ZEND_NULL_FUNCTION in DO_FCALL may be expensive. I think it must be better to use special predefined function (see zend_pass_function usage in zend_vm_def.h). I've made a different implementation here: https://github.com/smalyshev/php-src/compare/php:master...smalyshev:default_ctor_func?expand=1 which uses zend_pass_function but I had to make it static since otherwise it's increase refcount for object and it doesn't seem like it decrements back. So I'm not sure if it's right, what do you think? -- Stas Malyshev smalys...@gmail.com
Re: [PHP-DEV] Zend Engine 3
On Mon, Nov 24, 2014 at 1:10 AM, Andrea Faulds a...@ajf.me wrote: Good evening, Since phpng, int64, and perhaps other future changes in PHP 7 are a pretty big change, I think we ought to bump the major version number of the Zend Engine, from Zend Engine 2 to Zend Engine 3. I have a pull request open which would do this, although it needs updating to correct extensions checking for ZEND_ENGINE_2: https://github.com/php/php-src/pull/829 Are there any objections to the idea? I realise work on the engine isn’t done, but that doesn’t mean we can’t name the new version. After all, we’ve named PHP 7, and it doesn’t exist yet, either. Thoughts? Why do we need this define at all? Imho extensions should be checking against the API version, rather than a ZEND_ENGINE_N constant. This is more precise (it's not like extension code stays the same between minor versions), but the ZEND_ENGINE_N constant also has the problem that it targets only 5.x, even though the code it guards would usually be relevant to 7.x as well. Nikita
Re: [PHP-DEV] Zend Engine 3
On 24 Nov 2014 12:18, Nikita Popov nikita@gmail.com wrote: On Mon, Nov 24, 2014 at 1:10 AM, Andrea Faulds a...@ajf.me wrote: Good evening, Since phpng, int64, and perhaps other future changes in PHP 7 are a pretty big change, I think we ought to bump the major version number of the Zend Engine, from Zend Engine 2 to Zend Engine 3. I have a pull request open which would do this, although it needs updating to correct extensions checking for ZEND_ENGINE_2: https://github.com/php/php-src/pull/829 Are there any objections to the idea? I realise work on the engine isn’t done, but that doesn’t mean we can’t name the new version. After all, we’ve named PHP 7, and it doesn’t exist yet, either. Thoughts? Why do we need this define at all? Imho extensions should be checking against the API version, rather than a ZEND_ENGINE_N constant. This is more precise (it's not like extension code stays the same between minor versions), but the ZEND_ENGINE_N constant also has the problem that it targets only 5.x, even though the code it guards would usually be relevant to 7.x as well. Paying to way for now asking someone to know the specific individual API versions and their features, the convenience of saying I know ZE 3 supports this and ZE2 does not would be a worth while addition. Especially for newcomers to ext dev. Nikita
Re: [PHP-DEV] Zend Engine 3
On 24 Nov 2014 12:32, Paul Dragoonis dragoo...@gmail.com wrote: On 24 Nov 2014 12:18, Nikita Popov nikita@gmail.com wrote: On Mon, Nov 24, 2014 at 1:10 AM, Andrea Faulds a...@ajf.me wrote: Good evening, Since phpng, int64, and perhaps other future changes in PHP 7 are a pretty big change, I think we ought to bump the major version number of the Zend Engine, from Zend Engine 2 to Zend Engine 3. I have a pull request open which would do this, although it needs updating to correct extensions checking for ZEND_ENGINE_2: https://github.com/php/php-src/pull/829 Are there any objections to the idea? I realise work on the engine isn’t done, but that doesn’t mean we can’t name the new version. After all, we’ve named PHP 7, and it doesn’t exist yet, either. Thoughts? Why do we need this define at all? Imho extensions should be checking against the API version, rather than a ZEND_ENGINE_N constant. This is more precise (it's not like extension code stays the same between minor versions), but the ZEND_ENGINE_N constant also has the problem that it targets only 5.x, even though the code it guards would usually be relevant to 7.x as well. Paying to way for now asking someone to know the specific individual API On my phone sorry. This should be: Paving the way to not asking someone versions and their features, the convenience of saying I know ZE 3 supports this and ZE2 does not would be a worth while addition. Especially for newcomers to ext dev. Nikita
Re: [PHP-DEV] enhance fget to accept a callback
On 11/23/2014 2:47 PM, Rowan Collins wrote: For JSON, newlines aren't the delimiter you want, but with nested structures, I'm not sure how you'd parse a partial structure anyway. Are there JSON equivalents of SAX (event-based) parsers? If JSON is encoded into another format, newlines can be a valid delimiter. For example, JSON-Base64 uses newlines: http://jb64.org/ JSON-Base64 is more for cross-application support where PHP isn't the only language in the mix. If I'm moving data between two PHP hosts in a migration scenario, I'll tend to use serialize() and Base64 encoding, which preserves PHP objects across the network and requires less effort. -- Thomas Hruska CubicleSoft President I've got great, time saving software that you will find useful. http://cubiclesoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: Use zend_string* for op_array-arg_info[].name and class_name
On Mon, Nov 17, 2014 at 10:25 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, Please review the patch https://gist.github.com/dstogov/47a39aff37f0a6441ea0 Thanks. Dmitry. Hi Dmitry, sorry for late reply. The problem we're trying to solve here is lack of ability to create a zend_string at compile time. I just tried two ideas how we might be able to do that, to avoid introducing different arginfos for internal/userland functions. My first approach was to create a zend_string as a string literal using (zend_string *) (\1\0\0\0 \6\1\0\0 \0\0\0\0 \0\0\0\0 str) (for 32bit LE) and then update the length in zend_register_functions. However this didn't work because the string literal ends up in readonly memprotected memory, so this causes a segfault. My second approach was to use a C99 compound literal to create a temporary zend_string-like structure with the correct length: #define ZEND_STRING_CT(str) \ (zend_string *) (struct { \ uint32_t refcount; uint32_t type_info; \ zend_ulong h; size_t len; \ char val[sizeof(str)]; \ }[1]) {{ 1, IS_STRING | (IS_STR_PERSISTENT 8), 0, sizeof(str)-1, str }} This seems to work fine (patch https://github.com/nikic/php-src/commit/5d49321cd9728e0cc1c2939432e46159f9a78472). However it requires C99, which we're currently not allowed to use. Maybe someone has an idea how this can be done in C89? Nikita
[PHP-DEV] Re: Use zend_string* for op_array-arg_info[].name and class_name
Hi Nikita, Thanks for review. I already thought about both approaches and failed as well (the second also doesn't work with C++). The proposed patch doesn't complicate engine a lot (may be only the inheritance code), but I afraid about problems in some edge cases. Thanks. Dmitry. On Mon, Nov 24, 2014 at 5:27 PM, Nikita Popov nikita@gmail.com wrote: On Mon, Nov 17, 2014 at 10:25 AM, Dmitry Stogov dmi...@zend.com wrote: Hi, Please review the patch https://gist.github.com/dstogov/47a39aff37f0a6441ea0 Thanks. Dmitry. Hi Dmitry, sorry for late reply. The problem we're trying to solve here is lack of ability to create a zend_string at compile time. I just tried two ideas how we might be able to do that, to avoid introducing different arginfos for internal/userland functions. My first approach was to create a zend_string as a string literal using (zend_string *) (\1\0\0\0 \6\1\0\0 \0\0\0\0 \0\0\0\0 str) (for 32bit LE) and then update the length in zend_register_functions. However this didn't work because the string literal ends up in readonly memprotected memory, so this causes a segfault. My second approach was to use a C99 compound literal to create a temporary zend_string-like structure with the correct length: #define ZEND_STRING_CT(str) \ (zend_string *) (struct { \ uint32_t refcount; uint32_t type_info; \ zend_ulong h; size_t len; \ char val[sizeof(str)]; \ }[1]) {{ 1, IS_STRING | (IS_STR_PERSISTENT 8), 0, sizeof(str)-1, str }} This seems to work fine (patch https://github.com/nikic/php-src/commit/5d49321cd9728e0cc1c2939432e46159f9a78472). However it requires C99, which we're currently not allowed to use. Maybe someone has an idea how this can be done in C89? Nikita
Re: [PHP-DEV] [RFC] Default constructors
Dmitry Stogov wrote on 24/11/2014 09:56: However, I thought about one more inconsistent. Your patch works fine for parent:: methods but not for grandparents:: In the following code default constructor won't work. class A { } class B extends A { } class C extends B { function __constructor() { A::_constructor(); // this won't work } } I guess some inconsistency like this is hard to avoid unless the default constructor is actually added to the class's method table, because the code has to specifically check for each case that is to be supported. At risk of flogging a dead horse, this is why I was arguing for the lazy evaluation with new keyword to be abandoned, because it seems like that's the primary compatibility issue with adding a real default definition. Reflection would show the method as either internal or inherited from some implicit base class. From a user's point of view there should really be no difference between no constructor and constructor which does nothing, IMHO. Regards, -- Rowan Collins [IMSoP] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [VOTE][RFC] Safe Casting Functions
Le Wed Nov 19 2014 at 10:57:39 PM, Levi Morrison le...@php.net a écrit : - PHP suffers a lot from function bloat and this RFC provides multiple functions that do the same thing but differ only in how they handle errors. A simple validation of can this be safely cast to an integer without dataloss? avoid the issue entirely and would be fewer functions. My no is mostly for the reason above, mentioned by Levi and also because of the use of Exception (previously mentioned by Derick as well). To improve this RFC, I would: - use errors/warnings instead of exceptions (unless there is a more global approach on this aspect for PHP 7). - improve ext/filter to have stricter validation/sanitizing mechanism if this is missing. I would personally appreciate a syntax closer to the current casting mechanism, e.g.: $int = (=int) 42; // result: (int) 42 $string = (=string) 42; // result: (string) 42 $int = (=int) foobar; // result: E_ERROR: Can not cast (string) foobar strictly to an int $int = (~int) 42; // result: (int) 42 $int = (~int) foobar; // result: E_WARNING: Can not cast (string) foobar strictly to an int Note that I haven't investigate any possible syntax conflict. My 2 €cents. Regards, Patrick
Re: [PHP-DEV] [VOTE][RFC] Safe Casting Functions
On 24 Nov 2014, at 16:08, Patrick ALLAERT patrickalla...@php.net wrote: Le Wed Nov 19 2014 at 10:57:39 PM, Levi Morrison le...@php.net a écrit : - PHP suffers a lot from function bloat and this RFC provides multiple functions that do the same thing but differ only in how they handle errors. A simple validation of can this be safely cast to an integer without dataloss? avoid the issue entirely and would be fewer functions. My no is mostly for the reason above, mentioned by Levi and also because of the use of Exception (previously mentioned by Derick as well). To improve this RFC, I would: - use errors/warnings instead of exceptions (unless there is a more global approach on this aspect for PHP 7). Errors in PHP are horrible to handle. There’s absolutely no question of this RFC being revived using errors, at all. If I must, I’ll wait until exceptions are inevitably approved for core in PHP 7. Assuming they actually are. If they aren’t, I might actually quit PHP... I would personally appreciate a syntax closer to the current casting mechanism, e.g.: $int = (=int) 42; // result: (int) 42 $string = (=string) 42; // result: (string) 42 $int = (=int) foobar; // result: E_ERROR: Can not cast (string) foobar strictly to an int $int = (~int) 42; // result: (int) 42 $int = (~int) foobar; // result: E_WARNING: Can not cast (string) foobar strictly to an int PHP already already has enough bizarre syntaxes, I don’t think it needs even more. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Default constructors
Hi! However, I thought about one more inconsistent. Your patch works fine for parent:: methods but not for grandparents:: In the following code default constructor won't work. Yes, this is OK - the support is only for one pattern, calling the parent, because it's what you're supposed to do. If you do anything else, it would work (or not work) as before since it's not the best practice so you're on your own. It's not a big problem to fix implementation to support it, or may be support for parent:: is enough. Anyway, it should be reflected in RFC (if this code should work or should not). Sure, I'll add a note on RFC about it. -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] enhance fget to accept a callback
On 23 November 2014 23:36:30 GMT, Bill Salak b...@devtemple.com wrote: The callback would be given the string as returned by fgets today. The functional equivalent to fgetjson today is handled by something like $handle = fopen(~some file~, 'r'); while (($data = fgets($handle)) !== FALSE) { $data = json_decode($data, true); ...other stuff... } and would change to $handle = fopen(~some file~, 'r'); $decode = json_decode($data, true); while (($data = fgets($handle,0,$decode)) !== FALSE) { ...other stuff... } Since you need a function reference for the callback, you'd actually need a closure to capture the options: $decode = function($data) { return json_decode($data, true); }; This is actually more effort and code than the existing version, so I'm not sure what is gained. Either way, the likelihood is you'd want to wrap this into a user function. As I mentioned earlier, making it into an Iterator is often useful, and potentially as simple as a generator function a bit like this: function fjsoniterator($fh) { if ( ! feof($fh) ) { yield json_decode(fgets($fh), true); } } $fh = fopen(...); foreach ( fjsoniterator($fh) as $data ) { ... } Regards, -- Rowan Collins [IMSoP] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] AV on PHP 5.5.18 + Zend Opcache in accel_chdir
Internals folks-- Who owns Zend Opcache these days? I've got a crash dump that appears to be a double-free of ZCG(cwd) during accel_chdir on PHP 5.5.18. Does this crash look familiar to anyone? [windbg output] 0:000 .ecxr eax= ebx=01b47cb0 ecx=77b12240 edx=01b0 esi=01b12f08 edi=01e6e6d0 eip=6bdab9e7 esp=0194ef2c ebp=0cff53b0 iopl=0 nv up ei pl zr na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00210246 php5!_efree+0x27: 6bdab9e7 8b57f8 mov edx,dword ptr [edi-8] ds:002b:01e6e6c8= 0:000 k *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr 00 0194ef34 6bd52647 php5!_efree+0x27 [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_alloc.c @ 2440] 01 0194f044 6bda6ee6 php_opcache!zif_accel_chdir+0x67 [c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\opcache\zendaccelerator.c @ 158] 02 0194f0a4 6bda6645 php5!zend_do_fcall_common_helper_SPEC+0x176 [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_vm_execute.h @ 550] 03 0194f0dc 6bdc11c1 php5!execute_ex+0x295 [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_vm_execute.h @ 363] 04 0194f184 6bde68d0 php5!zend_call_function+0x3c1 [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_execute_api.c @ 937] 05 0194f1b8 6bde6808 php5!call_user_function_ex+0x50 [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_execute_api.c @ 725] 06 0194f1f0 6bee5ae9 php5!call_user_function+0x58 [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_execute_api.c @ 699] 07 0194f224 6bdbf51b php5!user_shutdown_function_call+0x79 [c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\standard\basic_functions.c @ 5001] 08 0194f238 6bee17e8 php5!zend_hash_apply+0x1b [c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_hash.c @ 716] 09 0194f290 6bdba9dc php5!php_call_shutdown_functions+0x48 [c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\standard\basic_functions.c @ 5088] 0a 0194f5d0 01141443 php5!php_request_shutdown+0x6c [c:\php-sdk\php55\vc11\x86\php-5.5.18\main\main.c @ 1746] 0b 0194f764 0114420c php_cgi!main+0x443 [c:\php-sdk\php55\vc11\x86\php-5.5.18\sapi\cgi\cgi_main.c @ 2505] 0c 0194f7a4 75f086e3 php_cgi!__tmainCRTStartup+0xfd [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 536] 0d 0194f7b0 77b1be99 kernel32!BaseThreadInitThunk+0xe [d:\win8_gdr\base\win32\client\thread.c @ 65] 0e 0194f7f4 77b1be6c ntdll!__RtlUserThreadStart+0x72 [d:\win8_gdr\minkernel\ntdll\rtlstrt.c @ 1024] 0f 0194f80c ntdll!_RtlUserThreadStart+0x1b [d:\win8_gdr\minkernel\ntdll\rtlstrt.c @ 939] 0:000 .frame 1 01 0194f044 6bda6ee6 php_opcache!zif_accel_chdir+0x67 [c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\opcache\zendaccelerator.c @ 158] 0:000 dv ht = 0n1 return_value = 0x0cf8e670 return_value_ptr = 0x this_ptr = 0x return_value_used = 0n0 cwd = char [260] D:\home\site\wwwroot 0:000 dt php_opcache!accel_globals +0x000 function_table : _hashtable +0x028 internal_functions_count : 0n1774 +0x02c counted : 0n0 +0x030 enabled : 0 '' +0x031 locked : 0 '' +0x034 bind_hash: _hashtable +0x060 accel_directives : _zend_accel_directives +0x0b0 cwd : 0x01e6e6d0 --- memory read error at address 0x01e6e6d0 --- +0x0b4 cwd_len : 0n20 [end windbg output] Looks like accel_globals.cwd is pointing at free'd memory. I looked through the existing bugs on Opcache, and I didn't see any that matched this crash. I wanted to check with the internals alias before I opened the bug. This is 5.5.18 NTS x86. Thx! --E. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [RFC] Unicode Escape Syntax
Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape It has a rationale section explaining why certain decisions were made, that I’d recommend you read in full. Thanks! -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 Nov 2014, at 22:09, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape My apologies to you all, a small correction: The title of that email should’ve been “[RFC] Unicode Codepoint Escape Syntax” to match the title of the RFC, I missed out the “Codepoint. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. You may want to make it a requirement that strings containing \u escapes are denoted as: ublah blahWe set aside this format back in the PHP6 days (note that bblah is equivalent to blah for binary strings). On the BMP versus SMP issue of \u styles, we addressed this in PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six hexit codepoints. e.g.\u1234 === \U001234 I'd rather follow this style than making \u special and different from hex and octal notations by using braces. -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 Nov 2014, at 22:21, Sara Golemon poll...@php.net wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. On that note, it strikes me now that we assume an encoding anyway for all escape sequences. If I’m using EBCDIC or UTF-16, “\n” isn’t going to help me much! You may want to make it a requirement that strings containing \u escapes are denoted as: ublah blahWe set aside this format back in the PHP6 days (note that bblah is equivalent to blah for binary strings). I’d rather keep ublah blah” for if/when we add actual Unicode strings. On the BMP versus SMP issue of \u styles, we addressed this in PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six hexit codepoints. e.g.\u1234 === \U001234 I'd rather follow this style than making \u special and different from hex and octal notations by using braces. That is something I’d thought about. \U takes 8 hex digits in every other language which has it, though. I suppose we could do this, it resolves the BMP issue, certainly. Still, I think the brace syntax has its advantages because it’s completely unambiguous and it means we only have one syntax for this, not two different ones (less mental overhead). Plus, it’s worth noting that \u would still be different from \ooo and \xXX anyway, as it’d be fixed-length while octal and hex aren’t. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 November 2014 at 14:21, Sara Golemon poll...@php.net wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. I'm also OK with this, although I do wonder if we should be respecting the user's default_charset setting instead. (Since default_charset defaults to UTF-8, in practice this isn't a significant difference for the average user.) You may want to make it a requirement that strings containing \u escapes are denoted as: ublah blahWe set aside this format back in the PHP6 days (note that bblah is equivalent to blah for binary strings). It seems to me that the point of \u and \U escapes is to embed Unicode in potentially non-Unicode strings, so using u doesn't feel right. On the BMP versus SMP issue of \u styles, we addressed this in PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six hexit codepoints. e.g.\u1234 === \U001234 I'd rather follow this style than making \u special and different from hex and octal notations by using braces. I think I prefer the brace style, personally. Non-BMP codepoints have become more important since PHP 6 (thanks, emoji), and having \u and \U be case sensitive when \x isn't seems confusing. Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 Nov 2014, at 22:30, Adam Harvey ahar...@php.net wrote: On 24 November 2014 at 14:21, Sara Golemon poll...@php.net wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. I'm also OK with this, although I do wonder if we should be respecting the user's default_charset setting instead. (Since default_charset defaults to UTF-8, in practice this isn't a significant difference for the average user.) Ooh, that would be a possibility. That or using whatever encoding the source file is specified to be with declare(), so it matches the encoding of other characters in the string. This’d add significant complexity to it, though (would we have to require ICU or something? D:), plus the vast majority of Unicode characters will only be supported by Unicode encodings… and of those, only UTF-8 is really in much use here anyway. You may want to make it a requirement that strings containing \u escapes are denoted as: ublah blahWe set aside this format back in the PHP6 days (note that bblah is equivalent to blah for binary strings). It seems to me that the point of \u and \U escapes is to embed Unicode in potentially non-Unicode strings, so using u doesn't feel right. I don’t really see where you’re coming from, it also makes just as much sense within Unicode strings. There are plenty of cases (like the U+202E or mañana examples in the RFC) where you’d want a Unicode escape in a Unicode string. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 November 2014 at 14:35, Andrea Faulds a...@ajf.me wrote: On 24 Nov 2014, at 22:30, Adam Harvey ahar...@php.net wrote: I'm also OK with this, although I do wonder if we should be respecting the user's default_charset setting instead. (Since default_charset defaults to UTF-8, in practice this isn't a significant difference for the average user.) Ooh, that would be a possibility. That or using whatever encoding the source file is specified to be with declare(), so it matches the encoding of other characters in the string. This’d add significant complexity to it, though (would we have to require ICU or something? D:), plus the vast majority of Unicode characters will only be supported by Unicode encodings… and of those, only UTF-8 is really in much use here anyway. We would have to require ICU, but that might be worthwhile for PHP 7 anyway. Having at least one i18n API that's guaranteed to be available would be nice. You may want to make it a requirement that strings containing \u escapes are denoted as: ublah blahWe set aside this format back in the PHP6 days (note that bblah is equivalent to blah for binary strings). It seems to me that the point of \u and \U escapes is to embed Unicode in potentially non-Unicode strings, so using u doesn't feel right. I don’t really see where you’re coming from, it also makes just as much sense within Unicode strings. There are plenty of cases (like the U+202E or mañana examples in the RFC) where you’d want a Unicode escape in a Unicode string. I probably worded that badly — I just mean that I don't think \u and \U should be limited to only u strings, but should work in normal strings as well. (In other words, I'm agreeing with what's in your RFC, not with Sara.) Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
We would have to require ICU, but that might be worthwhile for PHP 7 anyway. Having at least one i18n API that's guaranteed to be available would be nice. It's 2014. I think requiring ICU is reasonable at this point. Orthogonal to this RFC, but I'd be in favor of deprecating all the non-ICU intl stuff sometime soon. I probably worded that badly — I just mean that I don't think \u and \U should be limited to only u strings, but should work in normal strings as well. (In other words, I'm agreeing with what's in your RFC, not with Sara.) I don't feel strongly about the u requirement, it doesn't make the world a darker place if we're more permissive. Plus, it’s worth noting that \u would still be different from \ooo and \xXX anyway, as it’d be fixed-length while octal and hex aren’t. And I really wish that weren't true. :p -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 Nov 2014, at 23:19, Sara Golemon poll...@php.net wrote: We would have to require ICU, but that might be worthwhile for PHP 7 anyway. Having at least one i18n API that's guaranteed to be available would be nice. It's 2014. I think requiring ICU is reasonable at this point. I also think it would be reasonable to require ICU, especially as it means we could perhaps enable Joe Watkins’s UString by default, assuming it actually makes it into PHP 7. That said, I don’t think we should go down the route of making \u convert to the current encoding. It doesn’t make much sense, if any, for non-Unicode encodings, and nobody is using UTF-16 or UTF-32. Plus, it’d be inconsistent, given we don’t convert any of the other escape sequences in strings anyway! It would be quite weird if \u{77} converted yet \x77 did not. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On Mon, Nov 24, 2014 at 02:21:37PM -0800, Sara Golemon wrote: On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I'm okay with producing UTF-8 even though our strings are technically binary. As you state, UTF-8 is the de-facto encoding, and recognizing this is pretty reasonable. You may want to make it a requirement that strings containing \u escapes are denoted as: ublah blahWe set aside this format back in the PHP6 days (note that bblah is equivalent to blah for binary strings). On the BMP versus SMP issue of \u styles, we addressed this in PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six hexit codepoints. e.g.\u1234 === \U001234 I'd rather follow this style than making \u special and different from hex and octal notations by using braces. There is a big difference with \u or \U and \x or \o and that is the number of characters that follow the escape. \x has 2, \o has 3 - both are short and easy to count with the eye. \U012345 is quite long and it is not so visually obvious where it should end. Ergo: I prefer Andrea's \u{0123} as it is going to be more robust against typos. One other thing that we could do is to allow code points to be named, with \U (capital 'U') eg: echo \U{arabic letter alef}\n; If you think that it is a bad idea, please update the RFC to say why this is a bad idea and so why it is not going to happen - for now. It would be nice since a code point is just a big number without any really obvious meaning, but a name makes for greater clarity. However: I suspect that interpretting this might be considerably slower which means slower compilation. Regards -- Alain Williams Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 http://www.phcomp.co.uk/ Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php #include std_disclaimer.h -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On 24 Nov 2014, at 23:29, Alain Williams a...@phcomp.co.uk wrote: There is a big difference with \u or \U and \x or \o and that is the number of characters that follow the escape. \x has 2, \o has 3 - both are short and easy to count with the eye. \U012345 is quite long and it is not so visually obvious where it should end. Ergo: I prefer Andrea's \u{0123} as it is going to be more robust against typos. Typos are an angle I hadn’t quite considered, but yes, this syntax is better against that. Importantly, it’s a compile error if you produce a broken literal, while if you screwed up the brace-free style you’d probably just get a mangled string. One other thing that we could do is to allow code points to be named, with \U (capital 'U') eg: echo \U{arabic letter alef}\n”; Ooh, that’s an interesting idea. I believe Perl actually has this already, although it uses the \N syntax: http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES Is something like that what you have in mind? If you think that it is a bad idea, please update the RFC to say why this is a bad idea and so why it is not going to happen - for now. It would be nice since a code point is just a big number without any really obvious meaning, but a name makes for greater clarity. However: I suspect that interpretting this might be considerably slower which means slower compilation. I’ll add it to the Future Scope part. One issue with this, however, is that we’d have to include a Unicode info database from somewhere with the names of the characters. That’d probably mean requiring ICU or something like it, which the current patch doesn’t do. -- Andrea Faulds http://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On Mon, Nov 24, 2014 at 11:36:28PM +, Andrea Faulds wrote: On 24 Nov 2014, at 23:29, Alain Williams a...@phcomp.co.uk wrote: echo \U{arabic letter alef}\n”; Ooh, that’s an interesting idea. I believe Perl actually has this already, although it uses the \N syntax: http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES Is something like that what you have in mind? Exactly. Confession: it was looking at the perl documentation that led me to suggest it. -- Alain Williams Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 http://www.phcomp.co.uk/ Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php #include std_disclaimer.h -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote: Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape I've linked a provisional HHVM implementation from that page. Planning to match whatever PHP7 does, of course, but for the moment I've added named entity support since it's being discussed. https://github.com/sgolemon/hhvm/compare/unicode-escape -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [RFC] IntlChar class and intl_char_*() functions
While playing around with Andrea's unicode literals syntax proposal, I was reminded of just how little of ICU is exposed. I've put up a short proposal for adding IntlChar exporting these APIs as static methods (with a matching non-oop interface). https://wiki.php.net/rfc/intl.char -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] Unicode Escape Syntax
Le 24/11/2014 23:09, Andrea Faulds a écrit : Good evening, Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape It has a rationale section explaining why certain decisions were made, that I’d recommend you read in full. Excellent RFC, thank you for this proposal. I would suggest this talk https://speakerdeck.com/mathiasbynens/hacking-with-unicode (you might already know) but interesting concepts and limitations of current Unicode implementations are mentioned. The usage of `\u{…}` fixes most limitations and I could not be more agree with that notation! Cheers. -- Ivan Enderlin Developer of Hoa http://hoa-project.net/ PhD. at DISC/Femto-ST (Vesontio) and INRIA (Cassis) http://disc.univ-fcomte.fr/ and http://www.inria.fr/ Member of HTML and WebApps Working Group of W3C http://w3.org/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php