Andries Bos wrote: > I busy to convert characters within a widestring to an xml stream;
There shouldn't be any conversion necessary. XML already expects Unicode. It's harder to _not_ use Unicode when you're dealing with XML. > However > I encounter problems with 'special characters' e.g. the euro sign. > > Searching for a solution, I did find within the about.delphi.com website: > About Unicode character sets > > The ANSI character set used by Windows is a single-byte character set. Not always. The character set used by Windows NT and later is Unicode. Specifically, UTF-16. The non-Unicode character set depends on the user's locale settings and may be just one byte per character or a variable number of bytes. > Unicode stores each character in the character set in 2 bytes instead > of 1. Unicode just defines code points. How those code points are stored depends on the encoding. UTF-16 is one encoding that uses two bytes for nearly every character, four bytes for a handful of rarely used characters. UTF-32 uses four bytes per character. UTF-8 uses between one and six bytes per character. All are Unicode. > Some national languages use ideographic characters, which require > more than the 256 characters supported by ANSI. With 16-bit notation we > can represent 65,536 different characters. Indexing of multibyte > strings is not reliable, since s[i] represents the ith byte (not > necessarily the i-th character) in s. Note that where that text says "multibyte," it is *not* referring to Unicode. It's referring to the locale-specific character sets. > If you must use Wide characters, you should declare a string variable > to be of the WideString type and your character variable of the > WideChar type. If you want to examine a wide string one character at a > time, be sure to test for multibite characters. That's only if your program really needs to worry about the handful of characters in UTF-16 that can't be represented by a single 16-bit word. Most of the time, you don't need to worry about that. > Delphi doesn't support > automatic type conversions betwwen Ansi and Wide string types. Yes it does. It does the conversion using the user's default character set. That can be unreliable for your program, though, since you can't know in advance what that character set will be. It's better to use Unicode exclusively. > Does anyone know how to extract the nth character of type widechar of a > widestring? Use the bracket operator. ws[n] > My example: > > var > Value : widestring; > begin > Value = ''; > > examining this example will result in: > > Value = '' > TRUE > but > Value[1] = '' FALSE Beware of whether the character literal is being compiled as a WideChar rather than a Char or an AnsiString. > Conversing widestring character to xml format , i use: > '&#x' + IntToHex(LOrd, 4) + ';' Why are you doing that? Doesn't your XML library already support Unicode? If it doesn't, you should consider getting a different library. Any good XML library should be able to handle character data natively. It shouldn't require you to encode anything yourself. > Parsing the variable Value How do you parse the variable? > ord(value[1]) will result in 8364 and > ord(value) will result in 0080 ; I doubt that. Since Value is a WideString, Ord(value) will give you the address of the WideString's memory as interpretted as an integer. On the other hand, 80h is the code point frequently used in some Windows character sets for the euro character. It's not the Unicode code point for that character, though. -- Rob

