Re[2]: [lazarus] UTF-8 vs Unicode - Could someone explain?

Sergei Gorelkin Fri, 20 Oct 2006 09:44:01 -0700

Graeme Geldenhuys wrote:

GG> Now this brings me to another point which makes no sense! Naming
GG> convertion of functions.
GG> Lets look at the following RTL function as an example:


GG> * StrPos is used for 1 byte (8 bit) ANSI strings.

GG> * AnsiStrPos is used for multi-byte (or is that 2 bytes max) UTF-8
GG> strings. aka WideString.

GG> So why did Borland name it AnsiStrPos, when it doesn't operate on ANSI
GG> strings!!  Why not name it Utf8StrPos or WideStrPos?  The prefix Ansi*
GG> completely goes against what it does (operates on)! It doesn't work
GG> with Ansi strings, but rather WideStrings.

No, AnsiStr* functions have nothing common with Unicode. They work with
MBCS (MultiByte Character Set). It is like UTF8 in the sense that
a single character may represented by more than one byte, but encoding
rules are different. And these MBCS encoding rules are
locale-dependent. Each locale defines a set 'Lead bytes' which signify
start of multibyte sequence. For most European locales this set is empty,
and therefore AnsiStr* functions operate exactly as Str*.

<>

-- 
Best regards,
 Sergei


_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re[2]: [lazarus] UTF-8 vs Unicode - Could someone explain?

Reply via email to