Le 08/03/2015 12:17, Lester Caine a écrit :
On 08/03/15 10:03, Grégory Planchat wrote:
Then using multiple encodings in a same script or using a same script
for multiple encodings becomes straightforward and standard. Most PHP
developers doesn't even know what is Unicode or a character encoding,
they just see "odd characters that are removed with a header() call or
utf8_decode()", no teasing intended, they just don't want to have to
handle this. PHP should not let this sort of consideration to the sole
awareness of user-space developers.

Not part of THIS discussion exactly, but I have to take that in
isolation. 'Most PHP developers' need to be very aware of Unicode these
days. Simply pretending it does not exist is a deangerous exercise and
my own code base has been UTF8 for several years now. Even though I
don't speak anything but English, a large section of the material one
has to handle has characters which get lost if one does not maintain
UTF8 through out the process. People are going on about 'data loss' when
converting, and that applies equally to strings as numbers.

The default encoding these days is UTF8 ...


This is not exactly what I meant, and your point is the way things should be, of course.

What I meant is that a text search or fetching the size of a string *MUST* behave the same way, whatever which encoding you use, without having to know what is the actual enconding of the string at any time.

Currently a strlen on an UTF-8 behaves more like a C "sizeof(str) - 1" when you are using other characters than the ASCII page.

The idea is really making these statements work, whatever the encoding you are using :

"Lorem ipsum dolor sit amet"->length();
"Lorem ipsum dolor sit amet"->search('lorem');
"Lorem ipsum dolor sit amet"->replace('lorem', 'Lorem');

Grégory Planchat

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to