[PHP-DEV] Re: RE : [PATCH] zend-multibyte unicode detection vs. __halt_compiler()

Greg Beaver Fri, 07 Sep 2007 12:41:27 -0700

LAUPRETRE François (P) wrote:
> Hi,
> 
>> From: Rui Hirokawa
>> 
>> IMHO, #42396 is not a bug, but it is the specification. The normal
>> script doesn't contain a null byte if it is not encoded in Unicode.
>> 
>> 
>> It is understandable the addition of a unique byte seqence 
>> '0xFFFFFFFF' detection to support PHAR/PHK, but it is a change to
>> add a new feature.
> 
> Sorry to insist but, since __halt_compiler() was introduced, your
> assertion is not true any more.
> 
> Actually, it depends on what you consider as 'the script' : if you
> just consider the data from the beginning of the file to the
> __halt_compiler() directive, that's right: if this data contains a
> null byte, it is unicode.
> 
> But the current unicode detection is not aware of the
> __halt_compiler() directive, and it scans the whole file. So, your
> assertion is wrong: it is perfectly legitimate to have a non-unicode
> script contain null bytes (if they are after an __halt_compiler()
> directive). So, it is a bug and not a feature request. This side
> effect was not identified when __halt_compiler() was added.
> 
> The obvious solution is to decide that a non-unicode script cannot
> contain null bytes, even after an __halt_compiler(). It would just
> require three lines in the PHP doc. But that would introduce a severe
> limitation and, in practice, would make the __halt_compiler() feature
> almost useless.
> 
> The solution I am proposing is not very elegant but it is the only
> one I found which does not make __halt_compiler() and multibyte
> incompatible. As __halt_compiler() was introduced recently, and as,
> afaict, the only software to use it are PHAR and PHK, I consider it
> as acceptable, if not perfect.
> 
> Greg, Marcus, do you have a better idea ? I considered that unicode
> detection is done before __halt_compiler() can be detected, do you
> confirm ?


unicode detection in mb_string is in fact done before __halt_compiler().
 I don't think there is a solution to this problem without changes to PHP.

Fortunately, PHP 6 introduces usage of declare (please correct me if I'm
wrong) that allows declaration of the file's encoding, which would
remove the guesswork.

I think the best thing in this case is to recommend that multibyte
auto-detection be disabled, and wait for PHP 6 which provides a proper
solution to the unicode encoding issue.

Greg

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: RE : [PATCH] zend-multibyte unicode detection vs. __halt_compiler()

Reply via email to