Re: [delphi-en] extracting the n'th character of a widestring

Rob Kennedy Thu, 25 Jan 2007 18:07:17 -0800

Andries Bos wrote:
> I busy to convert characters within a widestring to an xml stream;


There shouldn't be any conversion necessary. XML already expects Unicode.
It's harder to _not_ use Unicode when you're dealing with XML.

> However
> I encounter problems with 'special characters' e.g. the euro sign.
>
> Searching for a solution, I did find within the about.delphi.com website:
> About Unicode character sets
>
> The ANSI character set used by Windows is a single-byte character set.

Not always. The character set used by Windows NT and later is Unicode.
Specifically, UTF-16. The non-Unicode character set depends on the user's
locale settings and may be just one byte per character or a variable
number of bytes.

> Unicode stores each character in the character set in 2 bytes instead
> of 1.

Unicode just defines code points. How those code points are stored depends
on the encoding. UTF-16 is one encoding that uses two bytes for nearly
every character, four bytes for a handful of rarely used characters.
UTF-32 uses four bytes per character. UTF-8 uses between one and six bytes
per character. All are Unicode.

> Some national languages use ideographic characters, which require
> more than the 256 characters supported by ANSI. With 16-bit notation we
> can represent 65,536 different characters. Indexing of multibyte
> strings is not reliable, since s[i] represents the ith byte (not
> necessarily the i-th character) in s.

Note that where that text says "multibyte," it is *not* referring to
Unicode. It's referring to the locale-specific character sets.

> If you must use Wide characters, you should declare a string variable
> to be of the WideString type and your character variable of the
> WideChar type. If you want to examine a wide string one character at a
> time, be sure to test for multibite characters.

That's only if your program really needs to worry about the handful of
characters in UTF-16 that can't be represented by a single 16-bit word.
Most of the time, you don't need to worry about that.

> Delphi doesn't support
> automatic type conversions betwwen Ansi and Wide string types.

Yes it does. It does the conversion using the user's default character
set. That can be unreliable for your program, though, since you can't know
in advance what that character set will be. It's better to use Unicode
exclusively.

> Does anyone know how to extract the nth character of type widechar of a
> widestring?

Use the bracket operator. ws[n]

> My example:
>
> var
>  Value : widestring;
> begin
>  Value = '';
>
> examining this example will result in:
>
> Value = '' > TRUE
> but
> Value[1] = ''  FALSE

Beware of whether the character literal is being compiled as a WideChar
rather than a Char or an AnsiString.

> Conversing widestring character to xml format , i use:
> '&#x' + IntToHex(LOrd, 4) + ';'

Why are you doing that? Doesn't your XML library already support Unicode?
If it doesn't, you should consider getting a different library. Any good
XML library should be able to handle character data natively. It shouldn't
require you to encode anything yourself.

> Parsing the variable Value

How do you parse the variable?

> ord(value[1]) will result in 8364  and
> ord(value) will result in 0080 ;

I doubt that. Since Value is a WideString, Ord(value) will give you the
address of the WideString's memory as interpretted as an integer. On the
other hand, 80h is the code point frequently used in some Windows
character sets for the euro character. It's not the Unicode code point for
that character, though.

-- 
Rob

Re: [delphi-en] extracting the n'th character of a widestring

Reply via email to