Re: [PHP-DEV] Zend Engine 3

2014-11-24 Thread Kalle Sommer Nielsen
2014-11-24 6:29 GMT+01:00 Xinchen Hui larue...@php.net:
 I don't understand why you rush for it. any work of you depends the
 number bumping?

I don't see what makes it so different that we cannot do it now
instead of later, not like it will be a game changer


+1 for the change



-- 
regards,

Kalle Sommer Nielsen
ka...@php.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Zend Engine 3

2014-11-24 Thread Ferenc Kovacs
On Mon, Nov 24, 2014 at 10:24 AM, Kalle Sommer Nielsen ka...@php.net
wrote:

 2014-11-24 6:29 GMT+01:00 Xinchen Hui larue...@php.net:
  I don't understand why you rush for it. any work of you depends the
  number bumping?

 I don't see what makes it so different that we cannot do it now
 instead of later, not like it will be a game changer


 +1 for the change


+1, and given how Dmitry already bumped the other api versions used by the
ext developers (ZEND_MODULE_API_NO/ZEND_VERSION) I don't think that there
is any reason for us to be afraid to change it.
And bumping it early would actually give more time to the extension authors
to accommodate their code.


-- 
Ferenc Kovács
@Tyr43l - http://tyrael.hu


Re: [PHP-DEV] [RFC] Default constructors

2014-11-24 Thread Dmitry Stogov
I like the last patch. I think ZEND_ACC_STATIC flag must not make any
problems.

However, I thought about one more inconsistent. Your patch works fine for
parent:: methods but not for grandparents::
In the following code default constructor won't work.

class A {
}
class B extends A {
}
class C extends B {
  function __constructor() {
A::_constructor(); // this won't work
  }
}

It's not a big problem to fix implementation to support it, or may be
support for parent:: is enough.
Anyway, it should be reflected in RFC (if this code should work or should
not).

Thanks. Dmitry.


On Fri, Nov 21, 2014 at 10:40 AM, Dmitry Stogov dmi...@zend.com wrote:

 thanks Stas. I'll think on next week.

 Dmitry.

 On Fri, Nov 21, 2014 at 6:59 AM, Stanislav Malyshev smalys...@gmail.com
 wrote:

 Hi!

  Additional check for ZEND_NULL_FUNCTION in DO_FCALL may be expensive.
  I think it must be better to use special predefined function (see
  zend_pass_function usage in zend_vm_def.h).

 I've made a different implementation here:

 https://github.com/smalyshev/php-src/compare/php:master...smalyshev:default_ctor_func?expand=1

 which uses zend_pass_function but I had to make it static since
 otherwise it's increase refcount for object and it doesn't seem like it
 decrements back. So I'm not sure if it's right, what do you think?

 --
 Stas Malyshev
 smalys...@gmail.com





Re: [PHP-DEV] Zend Engine 3

2014-11-24 Thread Nikita Popov
On Mon, Nov 24, 2014 at 1:10 AM, Andrea Faulds a...@ajf.me wrote:

 Good evening,

 Since phpng, int64, and perhaps other future changes in PHP 7 are a pretty
 big change, I think we ought to bump the major version number of the Zend
 Engine, from Zend Engine 2 to Zend Engine 3.

 I have a pull request open which would do this, although it needs updating
 to correct extensions checking for ZEND_ENGINE_2:
 https://github.com/php/php-src/pull/829

 Are there any objections to the idea? I realise work on the engine isn’t
 done, but that doesn’t mean we can’t name the new version. After all, we’ve
 named PHP 7, and it doesn’t exist yet, either.

 Thoughts?


Why do we need this define at all? Imho extensions should be checking
against the API version, rather than a ZEND_ENGINE_N constant. This is more
precise (it's not like extension code stays the same between minor
versions), but the ZEND_ENGINE_N constant also has the problem that it
targets only 5.x, even though the code it guards would usually be relevant
to 7.x as well.

Nikita


Re: [PHP-DEV] Zend Engine 3

2014-11-24 Thread Paul Dragoonis
On 24 Nov 2014 12:18, Nikita Popov nikita@gmail.com wrote:

 On Mon, Nov 24, 2014 at 1:10 AM, Andrea Faulds a...@ajf.me wrote:

  Good evening,
 
  Since phpng, int64, and perhaps other future changes in PHP 7 are a
pretty
  big change, I think we ought to bump the major version number of the
Zend
  Engine, from Zend Engine 2 to Zend Engine 3.
 
  I have a pull request open which would do this, although it needs
updating
  to correct extensions checking for ZEND_ENGINE_2:
  https://github.com/php/php-src/pull/829
 
  Are there any objections to the idea? I realise work on the engine isn’t
  done, but that doesn’t mean we can’t name the new version. After all,
we’ve
  named PHP 7, and it doesn’t exist yet, either.
 
  Thoughts?
 

 Why do we need this define at all? Imho extensions should be checking
 against the API version, rather than a ZEND_ENGINE_N constant. This is
more
 precise (it's not like extension code stays the same between minor
 versions), but the ZEND_ENGINE_N constant also has the problem that it
 targets only 5.x, even though the code it guards would usually be relevant
 to 7.x as well.

Paying to way for now asking someone to know the specific individual API
versions and their features, the convenience of saying I know ZE 3
supports this and ZE2 does not would be a worth while addition.

Especially for newcomers to ext dev.


 Nikita


Re: [PHP-DEV] Zend Engine 3

2014-11-24 Thread Paul Dragoonis
On 24 Nov 2014 12:32, Paul Dragoonis dragoo...@gmail.com wrote:


 On 24 Nov 2014 12:18, Nikita Popov nikita@gmail.com wrote:
 
  On Mon, Nov 24, 2014 at 1:10 AM, Andrea Faulds a...@ajf.me wrote:
 
   Good evening,
  
   Since phpng, int64, and perhaps other future changes in PHP 7 are a
pretty
   big change, I think we ought to bump the major version number of the
Zend
   Engine, from Zend Engine 2 to Zend Engine 3.
  
   I have a pull request open which would do this, although it needs
updating
   to correct extensions checking for ZEND_ENGINE_2:
   https://github.com/php/php-src/pull/829
  
   Are there any objections to the idea? I realise work on the engine
isn’t
   done, but that doesn’t mean we can’t name the new version. After all,
we’ve
   named PHP 7, and it doesn’t exist yet, either.
  
   Thoughts?
  
 
  Why do we need this define at all? Imho extensions should be checking
  against the API version, rather than a ZEND_ENGINE_N constant. This is
more
  precise (it's not like extension code stays the same between minor
  versions), but the ZEND_ENGINE_N constant also has the problem that it
  targets only 5.x, even though the code it guards would usually be
relevant
  to 7.x as well.

 Paying to way for now asking someone to know the specific individual API

On my phone sorry. This should be: Paving the way to not asking someone

versions and their features, the convenience of saying I know ZE 3
supports this and ZE2 does not would be a worth while addition.

 Especially for newcomers to ext dev.

 
  Nikita


Re: [PHP-DEV] enhance fget to accept a callback

2014-11-24 Thread Thomas Hruska

On 11/23/2014 2:47 PM, Rowan Collins wrote:

For JSON, newlines aren't the delimiter you want, but with nested structures, 
I'm not sure how you'd parse a partial structure anyway. Are there JSON 
equivalents of SAX (event-based) parsers?


If JSON is encoded into another format, newlines can be a valid 
delimiter.  For example, JSON-Base64 uses newlines:


http://jb64.org/

JSON-Base64 is more for cross-application support where PHP isn't the 
only language in the mix.  If I'm moving data between two PHP hosts in a 
migration scenario, I'll tend to use serialize() and Base64 encoding, 
which preserves PHP objects across the network and requires less effort.


--
Thomas Hruska
CubicleSoft President

I've got great, time saving software that you will find useful.

http://cubiclesoft.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Re: Use zend_string* for op_array-arg_info[].name and class_name

2014-11-24 Thread Nikita Popov
On Mon, Nov 17, 2014 at 10:25 AM, Dmitry Stogov dmi...@zend.com wrote:

 Hi,

 Please review the patch
 https://gist.github.com/dstogov/47a39aff37f0a6441ea0

 Thanks. Dmitry.


Hi Dmitry, sorry for late reply.

The problem we're trying to solve here is lack of ability to create a
zend_string at compile time. I just tried two ideas how we might be able to
do that, to avoid introducing different arginfos for internal/userland
functions.

My first approach was to create a zend_string as a string literal using
(zend_string *) (\1\0\0\0 \6\1\0\0 \0\0\0\0 \0\0\0\0 str) (for
32bit LE) and then update the length in zend_register_functions. However
this didn't work because the string literal ends up in readonly
memprotected memory, so this causes a segfault.

My second approach was to use a C99 compound literal to create a temporary
zend_string-like structure with the correct length:

#define ZEND_STRING_CT(str) \
(zend_string *) (struct { \
uint32_t refcount; uint32_t type_info; \
zend_ulong h; size_t len; \
char val[sizeof(str)]; \
}[1]) {{ 1, IS_STRING | (IS_STR_PERSISTENT  8), 0, sizeof(str)-1, str
}}

This seems to work fine (patch
https://github.com/nikic/php-src/commit/5d49321cd9728e0cc1c2939432e46159f9a78472).
However it requires C99, which we're currently not allowed to use.

Maybe someone has an idea how this can be done in C89?

Nikita


[PHP-DEV] Re: Use zend_string* for op_array-arg_info[].name and class_name

2014-11-24 Thread Dmitry Stogov
Hi Nikita,

Thanks for review. I already thought about both approaches and failed as
well (the second also doesn't work with C++).
The proposed patch doesn't complicate engine a lot (may be only the
inheritance code), but I afraid about problems in some edge cases.

Thanks. Dmitry.

On Mon, Nov 24, 2014 at 5:27 PM, Nikita Popov nikita@gmail.com wrote:

 On Mon, Nov 17, 2014 at 10:25 AM, Dmitry Stogov dmi...@zend.com wrote:

 Hi,

 Please review the patch
 https://gist.github.com/dstogov/47a39aff37f0a6441ea0

 Thanks. Dmitry.


 Hi Dmitry, sorry for late reply.

 The problem we're trying to solve here is lack of ability to create a
 zend_string at compile time. I just tried two ideas how we might be able to
 do that, to avoid introducing different arginfos for internal/userland
 functions.

 My first approach was to create a zend_string as a string literal using
 (zend_string *) (\1\0\0\0 \6\1\0\0 \0\0\0\0 \0\0\0\0 str) (for
 32bit LE) and then update the length in zend_register_functions. However
 this didn't work because the string literal ends up in readonly
 memprotected memory, so this causes a segfault.

 My second approach was to use a C99 compound literal to create a temporary
 zend_string-like structure with the correct length:

 #define ZEND_STRING_CT(str) \
 (zend_string *) (struct { \
 uint32_t refcount; uint32_t type_info; \
 zend_ulong h; size_t len; \
 char val[sizeof(str)]; \
 }[1]) {{ 1, IS_STRING | (IS_STR_PERSISTENT  8), 0, sizeof(str)-1,
 str }}

 This seems to work fine (patch
 https://github.com/nikic/php-src/commit/5d49321cd9728e0cc1c2939432e46159f9a78472).
 However it requires C99, which we're currently not allowed to use.

 Maybe someone has an idea how this can be done in C89?

 Nikita



Re: [PHP-DEV] [RFC] Default constructors

2014-11-24 Thread Rowan Collins

Dmitry Stogov wrote on 24/11/2014 09:56:

However, I thought about one more inconsistent. Your patch works fine for
parent:: methods but not for grandparents::
In the following code default constructor won't work.

class A {
}
class B extends A {
}
class C extends B {
   function __constructor() {
 A::_constructor(); // this won't work
   }
}


I guess some inconsistency like this is hard to avoid unless the default 
constructor is actually added to the class's method table, because the 
code has to specifically check for each case that is to be supported.


At risk of flogging a dead horse, this is why I was arguing for the lazy 
evaluation with new keyword to be abandoned, because it seems like 
that's the primary compatibility issue with adding a real default 
definition. Reflection would show the method as either internal or 
inherited from some implicit base class.


From a user's point of view there should really be no difference 
between no constructor and constructor which does nothing, IMHO.


Regards,
--
Rowan Collins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [VOTE][RFC] Safe Casting Functions

2014-11-24 Thread Patrick ALLAERT
Le Wed Nov 19 2014 at 10:57:39 PM, Levi Morrison le...@php.net a écrit :

   - PHP suffers a lot from function bloat and this RFC provides
 multiple functions that do the same thing but differ only in how they
 handle errors. A simple validation of can this be safely cast to an
 integer without dataloss? avoid the issue entirely and would be fewer
 functions.


My no is mostly for the reason above, mentioned by Levi and also because
of the use of Exception (previously mentioned by Derick as well).

To improve this RFC, I would:
- use errors/warnings instead of exceptions (unless there is a more global
approach on this aspect for PHP 7).
- improve ext/filter to have stricter validation/sanitizing mechanism if
this is missing.

I would personally appreciate a syntax closer to the current casting
mechanism, e.g.:

$int = (=int) 42; // result: (int) 42
$string = (=string) 42; // result: (string) 42
$int = (=int) foobar; // result: E_ERROR: Can not cast (string)
foobar strictly to an int

$int = (~int) 42; // result: (int) 42
$int = (~int) foobar; // result: E_WARNING: Can not cast (string)
foobar strictly to an int

Note that I haven't investigate any possible syntax conflict.

My 2 €cents.

Regards,
Patrick


Re: [PHP-DEV] [VOTE][RFC] Safe Casting Functions

2014-11-24 Thread Andrea Faulds

 On 24 Nov 2014, at 16:08, Patrick ALLAERT patrickalla...@php.net wrote:
 
 
 Le Wed Nov 19 2014 at 10:57:39 PM, Levi Morrison le...@php.net a écrit :
   - PHP suffers a lot from function bloat and this RFC provides
 multiple functions that do the same thing but differ only in how they
 handle errors. A simple validation of can this be safely cast to an
 integer without dataloss? avoid the issue entirely and would be fewer
 functions.
 
 My no is mostly for the reason above, mentioned by Levi and also because of 
 the use of Exception (previously mentioned by Derick as well).
 
 To improve this RFC, I would:
 - use errors/warnings instead of exceptions (unless there is a more global 
 approach on this aspect for PHP 7).

Errors in PHP are horrible to handle. There’s absolutely no question of this 
RFC being revived using errors, at all. If I must, I’ll wait until exceptions 
are inevitably approved for core in PHP 7. Assuming they actually are. If they 
aren’t, I might actually quit PHP...

 I would personally appreciate a syntax closer to the current casting 
 mechanism, e.g.:
 
 $int = (=int) 42; // result: (int) 42
 $string = (=string) 42; // result: (string) 42
 $int = (=int) foobar; // result: E_ERROR: Can not cast (string) 
 foobar strictly to an int
 
 $int = (~int) 42; // result: (int) 42
 $int = (~int) foobar; // result: E_WARNING: Can not cast (string) 
 foobar strictly to an int

PHP already already has enough bizarre syntaxes, I don’t think it needs even 
more.

--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Default constructors

2014-11-24 Thread Stanislav Malyshev
Hi!

 However, I thought about one more inconsistent. Your patch works fine
 for parent:: methods but not for grandparents::
 In the following code default constructor won't work.

Yes, this is OK - the support is only for one pattern, calling the
parent, because it's what you're supposed to do. If you do anything
else, it would work (or not work) as before since it's not the best
practice so you're on your own.

 It's not a big problem to fix implementation to support it, or may be
 support for parent:: is enough.
 Anyway, it should be reflected in RFC (if this code should work or
 should not).

Sure, I'll add a note on RFC about it.
-- 
Stas Malyshev
smalys...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] enhance fget to accept a callback

2014-11-24 Thread Rowan Collins
On 23 November 2014 23:36:30 GMT, Bill Salak b...@devtemple.com wrote:
The callback would be given the string as returned by fgets today. The
functional equivalent to fgetjson today is handled by something like 
$handle = fopen(~some file~, 'r');
while (($data = fgets($handle)) !== FALSE) {
$data = json_decode($data, true);
...other stuff...
}
and would change to
$handle = fopen(~some file~, 'r');
$decode = json_decode($data, true);
while (($data = fgets($handle,0,$decode)) !== FALSE) {
   ...other stuff...
}

Since you need a function reference for the callback, you'd actually need a 
closure to capture the options:

$decode = function($data) { return json_decode($data, true); };

This is actually more effort and code than the existing version, so I'm not 
sure what is gained.

Either way, the likelihood is you'd want to wrap this into a user function. As 
I mentioned earlier, making it into an Iterator is often useful, and 
potentially as simple as a generator function a bit like this:

function fjsoniterator($fh) {
if ( ! feof($fh) ) {
yield json_decode(fgets($fh), true);
}
}

$fh = fopen(...);
foreach ( fjsoniterator($fh) as $data ) { ... }

Regards,
-- 
Rowan Collins
[IMSoP]


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] AV on PHP 5.5.18 + Zend Opcache in accel_chdir

2014-11-24 Thread Eric Stenson
Internals folks--

Who owns Zend Opcache these days?  I've got a crash dump that appears to be a 
double-free of ZCG(cwd) during accel_chdir on PHP 5.5.18.  

Does this crash look familiar to anyone?

[windbg output]
0:000 .ecxr
eax= ebx=01b47cb0 ecx=77b12240 edx=01b0 esi=01b12f08 edi=01e6e6d0
eip=6bdab9e7 esp=0194ef2c ebp=0cff53b0 iopl=0 nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00210246
php5!_efree+0x27:
6bdab9e7 8b57f8  mov edx,dword ptr [edi-8] ds:002b:01e6e6c8=
0:000 k
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  
00 0194ef34 6bd52647 php5!_efree+0x27 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_alloc.c @ 2440]
01 0194f044 6bda6ee6 php_opcache!zif_accel_chdir+0x67 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\opcache\zendaccelerator.c @ 158]
02 0194f0a4 6bda6645 php5!zend_do_fcall_common_helper_SPEC+0x176 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_vm_execute.h @ 550]
03 0194f0dc 6bdc11c1 php5!execute_ex+0x295 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_vm_execute.h @ 363]
04 0194f184 6bde68d0 php5!zend_call_function+0x3c1 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_execute_api.c @ 937]
05 0194f1b8 6bde6808 php5!call_user_function_ex+0x50 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_execute_api.c @ 725]
06 0194f1f0 6bee5ae9 php5!call_user_function+0x58 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_execute_api.c @ 699]
07 0194f224 6bdbf51b php5!user_shutdown_function_call+0x79 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\standard\basic_functions.c @ 5001]
08 0194f238 6bee17e8 php5!zend_hash_apply+0x1b 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\zend\zend_hash.c @ 716]
09 0194f290 6bdba9dc php5!php_call_shutdown_functions+0x48 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\standard\basic_functions.c @ 5088]
0a 0194f5d0 01141443 php5!php_request_shutdown+0x6c 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\main\main.c @ 1746]
0b 0194f764 0114420c php_cgi!main+0x443 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\sapi\cgi\cgi_main.c @ 2505]
0c 0194f7a4 75f086e3 php_cgi!__tmainCRTStartup+0xfd 
[f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 536]
0d 0194f7b0 77b1be99 kernel32!BaseThreadInitThunk+0xe 
[d:\win8_gdr\base\win32\client\thread.c @ 65]
0e 0194f7f4 77b1be6c ntdll!__RtlUserThreadStart+0x72 
[d:\win8_gdr\minkernel\ntdll\rtlstrt.c @ 1024]
0f 0194f80c  ntdll!_RtlUserThreadStart+0x1b 
[d:\win8_gdr\minkernel\ntdll\rtlstrt.c @ 939]
0:000 .frame 1
01 0194f044 6bda6ee6 php_opcache!zif_accel_chdir+0x67 
[c:\php-sdk\php55\vc11\x86\php-5.5.18\ext\opcache\zendaccelerator.c @ 158]
0:000 dv
   ht = 0n1
 return_value = 0x0cf8e670
 return_value_ptr = 0x
 this_ptr = 0x
return_value_used = 0n0
  cwd = char [260] D:\home\site\wwwroot
0:000 dt php_opcache!accel_globals
   +0x000 function_table   : _hashtable
   +0x028 internal_functions_count : 0n1774
   +0x02c counted  : 0n0
   +0x030 enabled  : 0 ''
   +0x031 locked   : 0 ''
   +0x034 bind_hash: _hashtable
   +0x060 accel_directives : _zend_accel_directives
   +0x0b0 cwd  : 0x01e6e6d0  --- memory read error at address 
0x01e6e6d0 ---
   +0x0b4 cwd_len  : 0n20
[end windbg output]

Looks like accel_globals.cwd is pointing at free'd memory.

I looked through the existing bugs on Opcache, and I didn't see any that 
matched this crash.  I wanted to check with the internals alias before I opened 
the bug.

This is 5.5.18 NTS x86.

Thx!
--E.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds
Good evening,

Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape

It has a rationale section explaining why certain decisions were made, that I’d 
recommend you read in full.

Thanks!
--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds

 On 24 Nov 2014, at 22:09, Andrea Faulds a...@ajf.me wrote:
 
 Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape

My apologies to you all, a small correction: The title of that email should’ve 
been “[RFC] Unicode Codepoint Escape Syntax” to match the title of the RFC, I 
missed out the “Codepoint.
--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote:
 Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape

I'm okay with producing UTF-8 even though our strings are technically
binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
this is pretty reasonable.

You may want to make it a requirement that strings containing \u
escapes are denoted as:   ublah blahWe set aside this format
back in the PHP6 days (note that bblah is equivalent to blah for
binary strings).

On the BMP versus SMP issue of \u styles, we addressed this in
PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six
hexit codepoints.   e.g.\u1234 === \U001234   I'd rather
follow this style than making \u special and different from hex and
octal notations by using braces.

-Sara

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds

 On 24 Nov 2014, at 22:21, Sara Golemon poll...@php.net wrote:
 
 On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote:
 Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape
 
 I'm okay with producing UTF-8 even though our strings are technically
 binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
 this is pretty reasonable.

On that note, it strikes me now that we assume an encoding anyway for all 
escape sequences. If I’m using EBCDIC or UTF-16, “\n” isn’t going to help me 
much!

 You may want to make it a requirement that strings containing \u
 escapes are denoted as:   ublah blahWe set aside this format
 back in the PHP6 days (note that bblah is equivalent to blah for
 binary strings).

I’d rather keep ublah blah” for if/when we add actual Unicode strings. 

 On the BMP versus SMP issue of \u styles, we addressed this in
 PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six
 hexit codepoints.   e.g.\u1234 === \U001234   I'd rather
 follow this style than making \u special and different from hex and
 octal notations by using braces.

That is something I’d thought about. \U takes 8 hex digits in every other 
language which has it, though.

I suppose we could do this, it resolves the BMP issue, certainly. Still, I 
think the brace syntax has its advantages because it’s completely unambiguous 
and it means we only have one syntax for this, not two different ones (less 
mental overhead). Plus, it’s worth noting that \u would still be different from 
\ooo and \xXX anyway, as it’d be fixed-length while octal and hex aren’t.

--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Adam Harvey
On 24 November 2014 at 14:21, Sara Golemon poll...@php.net wrote:
 On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote:
 Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape

 I'm okay with producing UTF-8 even though our strings are technically
 binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
 this is pretty reasonable.

I'm also OK with this, although I do wonder if we should be respecting
the user's default_charset setting instead. (Since default_charset
defaults to UTF-8, in practice this isn't a significant difference
for the average user.)

 You may want to make it a requirement that strings containing \u
 escapes are denoted as:   ublah blahWe set aside this format
 back in the PHP6 days (note that bblah is equivalent to blah for
 binary strings).

It seems to me that the point of \u and \U escapes is to embed Unicode
in potentially non-Unicode strings, so using u doesn't feel right.

 On the BMP versus SMP issue of \u styles, we addressed this in
 PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six
 hexit codepoints.   e.g.\u1234 === \U001234   I'd rather
 follow this style than making \u special and different from hex and
 octal notations by using braces.

I think I prefer the brace style, personally. Non-BMP codepoints have
become more important since PHP 6 (thanks, emoji), and having \u and
\U be case sensitive when \x isn't seems confusing.

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds

 On 24 Nov 2014, at 22:30, Adam Harvey ahar...@php.net wrote:
 
 On 24 November 2014 at 14:21, Sara Golemon poll...@php.net wrote:
 On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote:
 Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape
 
 I'm okay with producing UTF-8 even though our strings are technically
 binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
 this is pretty reasonable.
 
 I'm also OK with this, although I do wonder if we should be respecting
 the user's default_charset setting instead. (Since default_charset
 defaults to UTF-8, in practice this isn't a significant difference
 for the average user.)

Ooh, that would be a possibility. That or using whatever encoding the source 
file is specified to be with declare(), so it matches the encoding of other 
characters in the string.

This’d add significant complexity to it, though (would we have to require ICU 
or something? D:), plus the vast majority of Unicode characters will only be 
supported by Unicode encodings… and of those, only UTF-8 is really in much use 
here anyway.

 You may want to make it a requirement that strings containing \u
 escapes are denoted as:   ublah blahWe set aside this format
 back in the PHP6 days (note that bblah is equivalent to blah for
 binary strings).
 
 It seems to me that the point of \u and \U escapes is to embed Unicode
 in potentially non-Unicode strings, so using u doesn't feel right.

I don’t really see where you’re coming from, it also makes just as much sense 
within Unicode strings. There are plenty of cases (like the U+202E or mañana 
examples in the RFC) where you’d want a Unicode escape in a Unicode string.

--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Adam Harvey
On 24 November 2014 at 14:35, Andrea Faulds a...@ajf.me wrote:

 On 24 Nov 2014, at 22:30, Adam Harvey ahar...@php.net wrote:
 I'm also OK with this, although I do wonder if we should be respecting
 the user's default_charset setting instead. (Since default_charset
 defaults to UTF-8, in practice this isn't a significant difference
 for the average user.)

 Ooh, that would be a possibility. That or using whatever encoding the source 
 file is specified to be with declare(), so it matches the encoding of other 
 characters in the string.

 This’d add significant complexity to it, though (would we have to require ICU 
 or something? D:), plus the vast majority of Unicode characters will only be 
 supported by Unicode encodings… and of those, only UTF-8 is really in much 
 use here anyway.

We would have to require ICU, but that might be worthwhile for PHP 7
anyway. Having at least one i18n API that's guaranteed to be available
would be nice.

 You may want to make it a requirement that strings containing \u
 escapes are denoted as:   ublah blahWe set aside this format
 back in the PHP6 days (note that bblah is equivalent to blah for
 binary strings).

 It seems to me that the point of \u and \U escapes is to embed Unicode
 in potentially non-Unicode strings, so using u doesn't feel right.

 I don’t really see where you’re coming from, it also makes just as much sense 
 within Unicode strings. There are plenty of cases (like the U+202E or mañana 
 examples in the RFC) where you’d want a Unicode escape in a Unicode string.

I probably worded that badly — I just mean that I don't think \u and
\U should be limited to only u strings, but should work in normal
strings as well. (In other words, I'm agreeing with what's in your
RFC, not with Sara.)

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
 We would have to require ICU, but that might be worthwhile for PHP 7
 anyway. Having at least one i18n API that's guaranteed to be available
 would be nice.

It's 2014.  I think requiring ICU is reasonable at this point.

Orthogonal to this RFC, but I'd be in favor of deprecating all the
non-ICU intl stuff sometime soon.

 I probably worded that badly — I just mean that I don't think \u and
 \U should be limited to only u strings, but should work in normal
 strings as well. (In other words, I'm agreeing with what's in your
 RFC, not with Sara.)

I don't feel strongly about the u requirement, it doesn't make the
world a darker place if we're more permissive.

 Plus, it’s worth noting that \u would still be different from \ooo and \xXX 
 anyway,
 as it’d be fixed-length while octal and hex aren’t.

And I really wish that weren't true. :p

-Sara

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds

 On 24 Nov 2014, at 23:19, Sara Golemon poll...@php.net wrote:
 
 We would have to require ICU, but that might be worthwhile for PHP 7
 anyway. Having at least one i18n API that's guaranteed to be available
 would be nice.
 
 It's 2014.  I think requiring ICU is reasonable at this point.

I also think it would be reasonable to require ICU, especially as it means we 
could perhaps enable Joe Watkins’s UString by default, assuming it actually 
makes it into PHP 7.

That said, I don’t think we should go down the route of making \u convert to 
the current encoding. It doesn’t make much sense, if any, for non-Unicode 
encodings, and nobody is using UTF-16 or UTF-32. Plus, it’d be inconsistent, 
given we don’t convert any of the other escape sequences in strings anyway! It 
would be quite weird if \u{77} converted yet \x77 did not.

--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Alain Williams
On Mon, Nov 24, 2014 at 02:21:37PM -0800, Sara Golemon wrote:
 On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote:
  Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape
 
 I'm okay with producing UTF-8 even though our strings are technically
 binary.  As you state, UTF-8 is the de-facto encoding, and recognizing
 this is pretty reasonable.
 
 You may want to make it a requirement that strings containing \u
 escapes are denoted as:   ublah blahWe set aside this format
 back in the PHP6 days (note that bblah is equivalent to blah for
 binary strings).
 
 On the BMP versus SMP issue of \u styles, we addressed this in
 PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six
 hexit codepoints.   e.g.\u1234 === \U001234   I'd rather
 follow this style than making \u special and different from hex and
 octal notations by using braces.

There is a big difference with \u or \U and \x or \o and that is the number of
characters that follow the escape. \x has 2, \o has 3 - both are short and easy
to count with the eye. \U012345 is quite long and it is not so visually obvious
where it should end.

Ergo: I prefer Andrea's \u{0123} as it is going to be more robust against 
typos.


One other thing that we could do is to allow code points to be named, with \U
(capital 'U') eg:

echo \U{arabic letter alef}\n;

If you think that it is a bad idea, please update the RFC to say why this is a
bad idea and so why it is not going to happen - for now.

It would be nice since a code point is just a big number without any really 
obvious
meaning, but a name makes for greater clarity.

However: I suspect that interpretting this might be considerably slower which
means slower compilation.

Regards

-- 
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT 
Lecturer.
+44 (0) 787 668 0256  http://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: 
http://www.phcomp.co.uk/contact.php
#include std_disclaimer.h

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Andrea Faulds

 On 24 Nov 2014, at 23:29, Alain Williams a...@phcomp.co.uk wrote:
 
 There is a big difference with \u or \U and \x or \o and that is the number of
 characters that follow the escape. \x has 2, \o has 3 - both are short and 
 easy
 to count with the eye. \U012345 is quite long and it is not so visually 
 obvious
 where it should end.
 
 Ergo: I prefer Andrea's \u{0123} as it is going to be more robust against 
 typos.

Typos are an angle I hadn’t quite considered, but yes, this syntax is better 
against that. Importantly, it’s a compile error if you produce a broken 
literal, while if you screwed up the brace-free style you’d probably just get a 
mangled string.

 One other thing that we could do is to allow code points to be named, with \U
 (capital 'U') eg:
 
 echo \U{arabic letter alef}\n”;

Ooh, that’s an interesting idea. I believe Perl actually has this already, 
although it uses the \N syntax:

http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES

Is something like that what you have in mind?

 If you think that it is a bad idea, please update the RFC to say why this is a
 bad idea and so why it is not going to happen - for now.
 
 It would be nice since a code point is just a big number without any really 
 obvious
 meaning, but a name makes for greater clarity.
 
 However: I suspect that interpretting this might be considerably slower which
 means slower compilation.

I’ll add it to the Future Scope part.

One issue with this, however, is that we’d have to include a Unicode info 
database from somewhere with the names of the characters. That’d probably mean 
requiring ICU or something like it, which the current patch doesn’t do.
--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Alain Williams
On Mon, Nov 24, 2014 at 11:36:28PM +, Andrea Faulds wrote:
 
  On 24 Nov 2014, at 23:29, Alain Williams a...@phcomp.co.uk wrote:

  echo \U{arabic letter alef}\n”;
 
 Ooh, that’s an interesting idea. I believe Perl actually has this already, 
 although it uses the \N syntax:
 
 http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES
 
 Is something like that what you have in mind?

Exactly.

Confession: it was looking at the perl documentation that led me to suggest it.

-- 
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT 
Lecturer.
+44 (0) 787 668 0256  http://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: 
http://www.phcomp.co.uk/contact.php
#include std_disclaimer.h

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Sara Golemon
On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds a...@ajf.me wrote:
 Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape

I've linked a provisional HHVM implementation from that page.
Planning to match whatever PHP7 does, of course, but for the moment
I've added named entity support since it's being discussed.

https://github.com/sgolemon/hhvm/compare/unicode-escape

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] [RFC] IntlChar class and intl_char_*() functions

2014-11-24 Thread Sara Golemon
While playing around with Andrea's unicode literals syntax proposal, I
was reminded of just how little of ICU is exposed.  I've put up a
short proposal for adding IntlChar exporting these APIs as static
methods (with a matching non-oop interface).

https://wiki.php.net/rfc/intl.char

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Unicode Escape Syntax

2014-11-24 Thread Ivan Enderlin @ Hoa

Le 24/11/2014 23:09, Andrea Faulds a écrit :

Good evening,

Here’s a new RFC: https://wiki.php.net/rfc/unicode_escape

It has a rationale section explaining why certain decisions were made, that I’d 
recommend you read in full.

Excellent RFC, thank you for this proposal.
I would suggest this talk 
https://speakerdeck.com/mathiasbynens/hacking-with-unicode (you might 
already know) but interesting concepts and limitations of current 
Unicode implementations are mentioned.
The usage of `\u{…}` fixes most limitations and I could not be more 
agree with that notation!


Cheers.

--
Ivan Enderlin
Developer of Hoa
http://hoa-project.net/

PhD. at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/

Member of HTML and WebApps Working Group of W3C
http://w3.org/



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php