Rob, thanks for your explanation.

As many other I had a wrong idea about unicode and how to use it: 
I did find a site explaning unicode in a simple way and maybe it helps others 
too:

http://www.joelonsoftware.com/articles/Unicode.html

regards

andriew

----- Original Message ----
From: Rob Kennedy <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, January 25, 2007 8:42:43 PM
Subject: Re: [delphi-en] extracting the n'th character of a widestring









  


    
            Andries Bos wrote:

> I busy to convert characters within a widestring to an xml stream;



There shouldn't be any conversion necessary. XML already expects Unicode.

It's harder to _not_ use Unicode when you're dealing with XML.



> However

> I encounter problems with 'special characters' e.g. the euro sign.

>

> Searching for a solution, I did find within the about.delphi. com website:

> About Unicode character sets

>

> The ANSI character set used by Windows is a single-byte character set.



Not always. The character set used by Windows NT and later is Unicode.

Specifically, UTF-16. The non-Unicode character set depends on the user's

locale settings and may be just one byte per character or a variable

number of bytes.



> Unicode stores each character in the character set in 2 bytes instead

> of 1.



Unicode just defines code points. How those code points are stored depends

on the encoding. UTF-16 is one encoding that uses two bytes for nearly

every character, four bytes for a handful of rarely used characters.

UTF-32 uses four bytes per character. UTF-8 uses between one and six bytes

per character. All are Unicode.



> Some national languages use ideographic characters, which require

> more than the 256 characters supported by ANSI. With 16-bit notation we

> can represent 65,536 different characters. Indexing of multibyte

> strings is not reliable, since s[i] represents the ith byte (not

> necessarily the i-th character) in s.



Note that where that text says "multibyte," it is *not* referring to

Unicode. It's referring to the locale-specific character sets.



> If you must use Wide characters, you should declare a string variable

> to be of the WideString type and your character variable of the

> WideChar type. If you want to examine a wide string one character at a

> time, be sure to test for multibite characters.



That's only if your program really needs to worry about the handful of

characters in UTF-16 that can't be represented by a single 16-bit word.

Most of the time, you don't need to worry about that.



> Delphi doesn't support

> automatic type conversions betwwen Ansi and Wide string types.



Yes it does. It does the conversion using the user's default character

set. That can be unreliable for your program, though, since you can't know

in advance what that character set will be. It's better to use Unicode

exclusively.



> Does anyone know how to extract the nth character of type widechar of a

> widestring?



Use the bracket operator. ws[n]



> My example:

>

> var

>  Value : widestring;

> begin

>  Value = '€';

>

> examining this example will result in:

>

> Value = '€' > TRUE

> but

> Value[1] = '€'  FALSE



Beware of whether the character literal is being compiled as a WideChar

rather than a Char or an AnsiString.



> Conversing widestring character to xml format , i use:

> '&#x' + IntToHex(LOrd, 4) + ';'



Why are you doing that? Doesn't your XML library already support Unicode?

If it doesn't, you should consider getting a different library. Any good

XML library should be able to handle character data natively. It shouldn't

require you to encode anything yourself.



> Parsing the variable Value



How do you parse the variable?



> ord(value[1] ) will result in 8364  and

> ord(value) will result in 0080 ;



I doubt that. Since Value is a WideString, Ord(value) will give you the

address of the WideString's memory as interpretted as an integer. On the

other hand, 80h is the code point frequently used in some Windows

character sets for the euro character. It's not the Unicode code point for

that character, though.



-- 

Rob





    
  

    
    




<!--

#ygrp-mlmsg {font-size:13px;font-family:arial,helvetica,clean,sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial,helvetica,clean,sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;
}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;
}
#ygrp-vitnav{
padding-top:10px;
font-family:Verdana;
font-size:77%;
margin:0;
}
#ygrp-vitnav a{
padding:0 1px;
}
#ygrp-actbar{
clear:both;
margin:25px 0;
white-space:nowrap;
color:#666;
text-align:right;
}
#ygrp-actbar .left{
float:left;
white-space:nowrap;
}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;
font-size:77%;
padding:15px 0;
}
#ygrp-ft{
font-family:verdana;
font-size:77%;
border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;
}

#ygrp-vital{
background-color:#e0ecee;
margin-bottom:20px;
padding:2px 0 8px 8px;
}
#ygrp-vital #vithd{
font-size:77%;
font-family:Verdana;
font-weight:bold;
color:#333;
text-transform:uppercase;
}
#ygrp-vital ul{
padding:0;
margin:2px 0;
}
#ygrp-vital ul li{
list-style-type:none;
clear:both;
border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;
color:#ff7900;
float:right;
width:2em;
text-align:right;
padding-right:.5em;
}
#ygrp-vital ul li .cat{
font-weight:bold;
}
#ygrp-vital a {
text-decoration:none;
}

#ygrp-vital a:hover{
text-decoration:underline;
}

#ygrp-sponsor #hd{
color:#999;
font-size:77%;
}
#ygrp-sponsor #ov{
padding:6px 13px;
background-color:#e0ecee;
margin-bottom:20px;
}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;
margin:0;
}
#ygrp-sponsor #ov li{
list-style-type:square;
padding:6px 0;
font-size:77%;
}
#ygrp-sponsor #ov li a{
text-decoration:none;
font-size:130%;
}
#ygrp-sponsor #nc {
background-color:#eee;
margin-bottom:20px;
padding:0 8px;
}
#ygrp-sponsor .ad{
padding:8px 0;
}
#ygrp-sponsor .ad #hd1{
font-family:Arial;
font-weight:bold;
color:#628c2a;
font-size:100%;
line-height:122%;
}
#ygrp-sponsor .ad a{
text-decoration:none;
}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;
}
#ygrp-sponsor .ad p{
margin:0;
}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;
}
#ygrp-text tt{
font-size:120%;
}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->








 
____________________________________________________________________________________
Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives.
http://tools.search.yahoo.com/toolbar/features/mail/

[Non-text portions of this message have been removed]

Reply via email to