On 10/8/07, Luca Olivetti <[EMAIL PROTECTED]> wrote:
> En/na Luca Olivetti ha escrit:
>
> >> You have to go through the string for UTF-8 and UTF-16 encodings so
> >> the advantages are at least questionable...
> >
> > Yes, but my (wrong) premise is that you could assume all characters are
> > 2 bytes wide, so the Nth character would be at N*2 byte.
>
> BTW, using strings as arrays of char to get at individual characters is
> risky business with utf-8. Or will be they converted to (pseudo)
> properties and (slowly) do the (slow) right thing?
> I also suppose that the functions in strutils are not utf-8 aware, so
> what should we be using in its place?

For single character processing UTF32 (4bytes) would be nice :), i
think functions to count UTF8 chars inside a string and getting each
char would be nice too, maybe even implemented in FPC for UTF8string
such as Lenght(utf8string) or indexing utf8string[1] to return the
char not the byte as UTF32.

Since FPC uses ANSI strings, a lot and most text is in latin1 without
any diacritics using UTF8 in Lazarus is a good choice, if the right
functions are provided it can be a great choice unless apps become too
slow.

Since the web uses mostly UTF8 for minimizing transfered data and also
most databases for minimal storage size it becomes clear that UTF8 is
a better choice if helper functions exist to assist with it's
management.

Razvan

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Reply via email to