[PHP-DEV] RE : [PATCH] zend-multibyte unicode detection vs. __halt_compiler()

P Fri, 07 Sep 2007 07:47:09 -0700

Hi,

> From: Rui Hirokawa
> 
> IMHO, #42396 is not a bug, but it is the specification.
> The normal script doesn't contain a null byte if it is not 
> encoded in Unicode.
> 
> It is understandable the addition of a unique byte seqence 
> '0xFFFFFFFF' detection to support PHAR/PHK, 
> but it is a change to add a new feature.


Sorry to insist but, since __halt_compiler() was introduced, your assertion is 
not true any more.

Actually, it depends on what you consider as 'the script' : if you just 
consider the data from the beginning of the file to the __halt_compiler() 
directive, that's right: if this data contains a null byte, it is unicode.

But the current unicode detection is not aware of the __halt_compiler() 
directive, and it scans the whole file. So, your assertion is wrong: it is 
perfectly legitimate to have a non-unicode script contain null bytes (if they 
are after an __halt_compiler() directive). So, it is a bug and not a feature 
request. This side effect was not identified when __halt_compiler() was added.

The obvious solution is to decide that a non-unicode script cannot contain null 
bytes, even after an __halt_compiler(). It would just require three lines in 
the PHP doc. But that would introduce a severe limitation and, in practice, 
would make the __halt_compiler() feature almost useless.

The solution I am proposing is not very elegant but it is the only one I found 
which does not make __halt_compiler() and multibyte incompatible. As 
__halt_compiler() was introduced recently, and as, afaict, the only software to 
use it are PHAR and PHK, I consider it as acceptable, if not perfect.

Greg, Marcus, do you have a better idea ? I considered that unicode detection 
is done before __halt_compiler() can be detected, do you confirm ?

Regards

Francois

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] RE : [PATCH] zend-multibyte unicode detection vs. __halt_compiler()

Reply via email to