Rasmus Lerdorf wrote:
On 03/16/2010 12:05 PM, dreamcat four wrote:
On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf<ras...@lerdorf.com>  wrote:
On 03/16/2010 10:40 AM, dreamcat four wrote:
As for text files on disk, if they are unicode, they are most commonly
utf-8 too. So then, why use utf-16 as internal unicode representation
in Php? It doesn't really make a lot of sense for most regular people
who want to use Php for their web application. Unless they don't
really care how slow its gonna be converting everything, constantly...

Well, the obvious original reason is that ICU uses UTF-16 internally and
the logic was that we would be going in and out of ICU to do all the
various Unicode operations many more times than we would be interfacing
with external things like MySQL or files on disk.  You generally only
read or write a string once from an external source, but you may perform
multiple Unicode operations on that same string so avoiding a conversion
for each operation seems logical.

-Rasmus

Its only logical if you've bothered to profile the conversion calls to
ICU against the non-ICU conversion calls. Im guessing the way to do
that, is to have 2 versions of each conversion method. One used by
ICU, and another used everywhere else. The harder part is to find some
suitable, real life php programs to test with.

You mean check to see how many actual Unicode operations a standard app
makes?  We did talk about that, but there is a bit of a chicken-and-egg
problem here.  Because PHP doesn't natively support Unicode, people
write apps in a way that lets them just pass Unicode through PHP and
deal with it elsewhere.  I would expect the profile to change once PHP
gets better support for Unicode.

But yes, some ideas around lazy conversions and other tricks would be
interesting.  If your input and output encoding are both utf-8 and all
your data sources are utf-8 and you never do any sort of string
manipulation on a particular string, why bother doing the utf-8 to
utf-16 conversion on that string.

I think that is what I said originally ;)
When a string is read in you set an extra flag if it needs special handling, otherwise you just handle it as a single byte per character string ... and for the diehards you add a switch to treat everything as it is now :)

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to