On Tue, Mar 16, 2010 at 11:48 AM, dreamcat four <dreamc...@gmail.com> wrote: > On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine <les...@lsces.co.uk> wrote: >> '3' is not a very processor friendly number, so working with 4 even though >> wasteful on memory, does make perfect sense. How long is it since we had a >> 640k limit on working memory? SERVERS should have a good amount of memory >> for caching information anyway. SO is UTF-16 the right approach for >> processing wide strings? It needs special code to handle everything wider >> than 16 bits, but at what gain really? If all core functionality is handled >> as 32 bit characters is there that much of an overhead over the additional >> processing to get around strings of dissimilar sizes in UTF-16 ? > > Just to re-enforce some of Lester's points above here. > > 4-byte per character is never slower that 2-bytes per character... its > faster if anything. Bear in mind that 4-byte has been the defacto size > for all modern cpu registers / 32-bit microarchitectures since.... > like... Forever. Give a c compiler 4bytes of data... it'll say: thank > you very much, and more of the same please! It keeps em happy ;) > > Sure UTF-16 can make sense. But only if your external representations > are also in UTF-16. So whats the default Unicode settings for MYSQL, > POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? >
To answer my own question, I have done some some further research. It seems that both MySQL and Postgre recommend / default to Latin1 (8-bit ASCII) and 'C' (7-bit ASCII) respectively. So that is to say neither set themselves to any unicode standard by default. In the case of Postgre, the ASCII default is often overiden to UTF-8 by the distro / os / package managers. From the $LOCALE environment variable. So then its UTF-8. In the case of MySQL, it may be left as latin1. But most competent web developers decide to set it to utf-8. Again, its not generally believed that very many people (by comparison) actively chooses utf-16. The most common encoding issue people run into is that their web application has sent their database utf-8 encoded data. But their (usually a MySQL) database still has the factory default encoding Latin-1 (8-bit ascii). People who discover this almost always solve the problem by converting their databases into utf-8. As for text files on disk, if they are unicode, they are most commonly utf-8 too. So then, why use utf-16 as internal unicode representation in Php? It doesn't really make a lot of sense for most regular people who want to use Php for their web application. Unless they don't really care how slow its gonna be converting everything, constantly... -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php