Re: [lazarus] UTF-8 vs UTF-16 support

Marco Ciampa Sun, 07 Oct 2007 15:37:51 -0700

On Fri, Oct 05, 2007 at 01:14:23PM +0200, Luca Olivetti wrote:
> En/na [EMAIL PROTECTED] ha escrit:
>
>> * WideString allows indexed "[]" accessing individual chars.
>> This does not seem to be correct. I read that utf16 can be 4 byte long.. 
>> Then calculation is needed sometimes...
>
> Unless you're dealing with klingon and ancient languages, 
Like Chinese? Just a billion people use it...not a real problem at all...
:-\


> I think you can assume that for 99.99% of currently spoken languages every
> character will be exactly 2 bytes long. 
Wrong as I said before.

> There's a risk of having some character with more that 2 bytes but it is 
> a small risk. 
> With utf-8 the risk is bigger, so you have always to traverse 
> the string if you need access to a specific character index.
You have to go through the string for UTF-8 and UTF-16 encodings 
so the advantages are at least questionable... 

ciao

-- 

Marco Ciampa

+--------------------+
| Linux User  #78271 |
| FSFE fellow   #364 |
+--------------------+

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re: [lazarus] UTF-8 vs UTF-16 support

Reply via email to