Re: [PHP-DEV] character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality

Makoto Tozawa Fri, 26 Aug 2005 11:03:33 -0700

Yes.

Makoto


Tex Texin wrote:

Makoto,

ok, thanks. Now I see. You are saying that the multi-byte extension took the 
opposite approach and made existing str*() byte oriented and the mb_str*() 
character oriented.

So we have:
s2m) Migration from singlebyte to multibyte: change code that needs character 
units to use mb*()

s2u) Migration from singlebyte to (new) Unicode: change code that needs byte 
units to use new functions.

m2u) Migration from multibyte to Unicode: change str*() code that needs byte 
units to use new functions and (perhaps) change mb*() functions back to str*().

ugh!
and some folks are using the multibyte extension as their current unicode 
solution so that the case m2u additionally represents migration from unicode to 
unicode when the php version changes.

That would bear some additional consideration.

But, in looking at the mb* doc, the str*() functions can be overloaded to use 
the mb*() character semantics. If a good number of users do that, then it isn't 
much consequence either way (ie it is no-win), and that puts us back to the 
original proposal.

Is that right?

Tex Texin
Internationalization Architect,   Yahoo! Inc.
-----Original Message-----
From: Makoto Tozawa [mailto:[EMAIL PROTECTED]Sent: Thursday, August 25, 2005 5:30 PM
To: Tex Texin
Cc: 'Andrei Zmievski'; [EMAIL PROTECTED]; 'PHPDevelopers Mailing List'Subject: Re: [PHP-DEV] Re: PHP Unicode support designdocument-keeping existing functionality
> If we don't make the functions provide reasonable behavior forunicode, then every program needs to be rewritten to changefunction names.
I agree. I asked it because the Backwards Compatibilitysection statesthe following:
"... the upgrade to Unicode-enabled PHP has to be transparent. Thismeans that the existing data types and functions must work asthey havealways done."
For those functions written for single byte encoding, the upgrade toUnicode-enabled PHP will be transparent because the charactersemanticsremains same. For those functions written for multi byteencoding usingmb_str*() functions, it will be also transparent.
It is okay if there is no way to save those functions writtenfor multibyte encoding abusing the str*() functions.
Makoto


Tex Texin wrote:
1) sorry I am compelled to change the subject so all threads
don’t look
the same.
2) It's a no-win situation. If we don't make the functions providereasonable behavior for unicode, then every program needs to berewritten to change function names. The number of places where hardcoded constants (6) are used is probably much smaller.
At least this way most code does the right thing as-is.
Also, if you don't want functions to show unicode behavior, leaveunicode off, and just convert the data to utf-8.
We do need to have functions that provide the raw byte length, so itwill be available.
Tex Texin
Internationalization Architect,   Yahoo! Inc.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality

Reply via email to