LAUPRETRE François (P) wrote: > Hi, > >> From: Rui Hirokawa >> >> IMHO, #42396 is not a bug, but it is the specification. The normal >> script doesn't contain a null byte if it is not encoded in Unicode. >> >> >> It is understandable the addition of a unique byte seqence >> '0xFFFFFFFF' detection to support PHAR/PHK, but it is a change to >> add a new feature. > > Sorry to insist but, since __halt_compiler() was introduced, your > assertion is not true any more. > > Actually, it depends on what you consider as 'the script' : if you > just consider the data from the beginning of the file to the > __halt_compiler() directive, that's right: if this data contains a > null byte, it is unicode. > > But the current unicode detection is not aware of the > __halt_compiler() directive, and it scans the whole file. So, your > assertion is wrong: it is perfectly legitimate to have a non-unicode > script contain null bytes (if they are after an __halt_compiler() > directive). So, it is a bug and not a feature request. This side > effect was not identified when __halt_compiler() was added. > > The obvious solution is to decide that a non-unicode script cannot > contain null bytes, even after an __halt_compiler(). It would just > require three lines in the PHP doc. But that would introduce a severe > limitation and, in practice, would make the __halt_compiler() feature > almost useless. > > The solution I am proposing is not very elegant but it is the only > one I found which does not make __halt_compiler() and multibyte > incompatible. As __halt_compiler() was introduced recently, and as, > afaict, the only software to use it are PHAR and PHK, I consider it > as acceptable, if not perfect. > > Greg, Marcus, do you have a better idea ? I considered that unicode > detection is done before __halt_compiler() can be detected, do you > confirm ?
unicode detection in mb_string is in fact done before __halt_compiler(). I don't think there is a solution to this problem without changes to PHP. Fortunately, PHP 6 introduces usage of declare (please correct me if I'm wrong) that allows declaration of the file's encoding, which would remove the guesswork. I think the best thing in this case is to recommend that multibyte auto-detection be disabled, and wait for PHP 6 which provides a proper solution to the unicode encoding issue. Greg -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php