Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

dreamcat four Tue, 16 Mar 2010 10:41:37 -0700

On Tue, Mar 16, 2010 at 11:48 AM, dreamcat four <dreamc...@gmail.com> wrote:
> On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine <les...@lsces.co.uk> wrote:
>> '3' is not a very processor friendly number, so working with 4 even though
>> wasteful on memory, does make perfect sense. How long is it since we had a
>> 640k limit on working memory? SERVERS should have a good amount of memory
>> for caching information anyway. SO is UTF-16 the right approach for
>> processing wide strings? It needs special code to handle everything wider
>> than 16 bits, but at what gain really? If all core functionality is handled
>> as 32 bit characters is there that much of an overhead over the additional
>> processing to get around strings of dissimilar sizes in UTF-16 ?
>
> Just to re-enforce some of Lester's points above here.
>
> 4-byte per character is never slower that 2-bytes per character... its
> faster if anything. Bear in mind that 4-byte has been the defacto size
> for all modern cpu registers / 32-bit microarchitectures since....
> like... Forever. Give a c compiler 4bytes of data... it'll say: thank
> you very much, and more of the same please! It keeps em happy ;)
>
> Sure UTF-16 can make sense. But only if your external representations
> are also in UTF-16. So whats the default Unicode settings for MYSQL,
> POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?
>


To answer my own question, I have done some some further research.

It seems that both MySQL and Postgre recommend / default to Latin1
(8-bit ASCII) and  'C' (7-bit ASCII) respectively. So that is to say
neither set themselves to any unicode standard by default.

In the case of Postgre, the ASCII default is often overiden to UTF-8
by the distro / os / package managers. From the $LOCALE environment
variable. So then its UTF-8.

In the case of MySQL, it may be left as latin1. But most competent web
developers decide to set it to utf-8. Again, its not generally
believed that very many people (by comparison) actively chooses
utf-16. The most common encoding issue people run into is that their
web application has sent their database utf-8 encoded data. But their
(usually a MySQL) database still has the factory default encoding
Latin-1 (8-bit ascii). People who discover this almost always solve
the problem by converting their databases into utf-8.

As for text files on disk, if they are unicode, they are most commonly
utf-8 too. So then, why use utf-16 as internal unicode representation
in Php? It doesn't really make a lot of sense for most regular people
who want to use Php for their web application. Unless they don't
really care how slow its gonna be converting everything, constantly...

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Where are we ACTUALLY on Unicode?

Reply via email to