On 29/06/07, Tomas Kuliavas <[EMAIL PROTECTED]> wrote:
>> It comes down to predicting the future.  Whichever way we go, the
>> decision is going to be second-guessed.  If we have critical mass for
>> a
>> clean BC break, then I am ok with it.  For me personally it would make
>> things a bit easier, but I think it would be a long long time before
>> we
>> saw any large hosts out there switch to a PHP 6 that can't run common
>> PHP 5 apps.
>
> If they switch to 6 with unicode off, and never ever get around to
> turning unicode on, will it really be any better?
>
> They'll just be running some weird-o setup that causes all kinds of
> bugs and issues and you'll have users with php 6 apps that won't work
> in php 6 and who submit bogus bug reports about it, because of the
> setting.
>
> A clean break is probably better, especially if it makes php 6 much
> more maintainable.
>
> Large-scale hosts won't switch to 6 any faster than they switched to
> 5, unless there are ZERO BC breaks.
>
> And nobody can guarantee zero breaks, because there are always buglets.

buglet = small break and not something that requires massive code rewrite.
Rewritten code is no longer backwards compatible. So developers have to
maintain two code branches or two different sets of libraries. If code is
maintained in one branch, scripts will need wrapper functions for most of
PHP string and stream function calls. Instead of having performance loss
in interpreter, you will force performance loss in portable scripts.

> The effort to have unicode off in 6 is probably larger than the effort
> to document what needs to be done to a PHP 5 app to make it be
> 6-friendly, or even write tools to auto-convert the buik of a script.
>
> If unicode semantics are "on" what exactly is borked in PHP 5?

In Unicode mode \[0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode code
points and not to octal or hexadecimal byte values. Fix is not backwards
compatible.

Scripts can't match bytes. How they are supposed to check if string is in
plain ascii or in 8bit? Do conversion to ASCII and check for errors
instead of looking for 8bit byte values? How can scripts replace 8bit
bytes with some other strings? ISO-8859-2 decoding table contains 95
entries written and evaluated as binary strings. Same thing applies to
other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8
decoding does not use mapping tables and performs complex calculations
with byte values. multibyte character set decoding might actually benefit
from unicode_encode(), if Table 325 (http://www.php.net/unicode) provides
more information about U_INVALID_SUBSTITUTE and other unicode. settings.

PHP6 does not provide backwards compatible functions to work with bytes.
Provided constructs are not backwards compatible. If scripts want to do
MIME Q encoding, they must work with bytes. Doing Q encoding with provided
PHP extensions adds extra dependencies.

ICU does not support HTML target. Text conversion to iso-8859-x or
windows-125x targets will be lossy.

> Can that be fixed to be BC without resorting to this toggle?

Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and older.

PHP6 could introduce new Unicode aware functions, but Unicode
implementation choose to modify existing ones. All low level string
operations ($string[1]) are Unicode aware by default and not when script
actually asks for it. Such implementation is designed for developers, who
don't care about Unicode support and want it out of the box without any
changes in their Unicode unaware scripts. It is not designed for
developers that actually need it and want to have code working in PHP6 and
PHP4/5.

Unicode code points can be defined with \u, but PHP6 breaks existing octal
and hex escape sequences.

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning: base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string given")
about data stream and string operations. even when fwrite() or
base64_encode() works only with plain ascii data. PHP script developers
are not used to strict variable type checks in string functions. Which
functions are modified to require binary typecasting? Do I have to make a
list myself every time some function freaks out?


--
Tomas

The more I read about what is in place for PHP6 with regard to
Unicode, I feel Unicode should have been an extension included in the
core, rather than rewriting the core. Provide a series of useful
classes and functions. It is there if you want it and as more and more
people get used to it, more use will be made of it. It almost looks
like all the time and energy (thank you to you all) that has been put
into PHP6 to make it Unicode aware will be wasted if it is disabled by
default. I also feel that if it is enabled by default and causes so
much BC that no one will upgrade.



--
-----
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to