RE: R: R: R: using non standard character with zerces

Jesse Pelton Mon, 19 Sep 2005 07:45:25 -0700

I should know better than to just ape other people's code without understanding 
it.  What does the X() macro or function do?


It's starting to sound like the problem is your compiler's wide character 
support (if any).  Does your compiler have support for strings of characters 
with more than 7 bits?  If not, you'll probably have to create XMLCh arrays 
rather than native strings.  If you put the following XMLCh string into your 
DOM, you should get a parenthesized yen symbol in the output:

  XMLCh xmlStr[] = { '(', 0xA5, ')', chNull };

If you need cross-platform portability, this is definitely the way to go.  If 
you look in XMLUni.cpp, you'll see dozens of strings defined this way precisely 
because compiler support for wide character strings is quite variable.

> -----Original Message-----
> From: AESYS S.p.A. [Enzo Arlati] [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 19, 2005 10:22 AM
> To: [email protected]
> Subject: R: R: R: R: using non standard character with zerces
> 
> 
> but when I try to translate a string with a char greater than 
> hex 7f I got an empty string
> so 
>    XMLCh * xmlStr = X( "1234 \x28 & \x29 " )
> give a string "1234 ( &amp; )"
> while if I add the char \x80 I got an empty styring.
> what can I do to manage also the char with code between 127 and 255 ?
> 
> 
> -----Messaggio originale-----
> Da: Jesse Pelton [mailto:[EMAIL PROTECTED]
> Inviato: lunedì 19 settembre 2005 15.25
> A: [email protected]; [EMAIL PROTECTED]
> Oggetto: RE: R: R: R: using non standard character with zerces
> 
> 
> Sure, you can store 0xA5 in a DOM string, but you have to 
> represent it properly in the string that you store.  This 
> means you have to store the character value 0xA5 in the 
> string; you cannot represent it in the string as a numeric 
> entity like "&#xA5":
> 
>    XMLCh* pszA5Good = X("\xA5");  // Yen
>    XMLCh* pszA5Bad  = X("&#xA5"); // goobledygook
> 
> Both strings are perfectly legitimate, but if you put the 
> latter into the DOM, the serializer MUST escape the ampersand 
> so that the string you are adding to the DOM can be 
> faithfully recovered.  In other words, if you say the string 
> is "&#xA5", the serializer must escape it so that when it's 
> parsed, the string's value remains "&#xA5", because that's 
> the string you specified.
> 
> If you put the former into the DOM, the serializer will 
> likewise do what it must to ensure that the specified string 
> comes back when Xerces or some other conforming XML processor 
> parses the document.  Depending on the document encoding, it 
> may or may not be serialized as "&#xA5."  Any conforming 
> processor that recognizes the document encoding will parse 
> the serialized value correctly.
> 
> The bottom line is, don't pre-escape anything that you put 
> into the DOM.  If you do, the serializer must escape it 
> again, and you won't get your desired results.  Rather than:
> 
>   stmp = " start &apos; &lt; &gt;  &amp; &#x28; &#xA4; &#xA5; 
> &#x29; end";
>   dtxt = pDoc->createTextNode( X( stmp.c_str()));
> 
> Do:
> 
>   dtxt = pDoc->createTextNode( X(" start ' < >  & \x28 \xA4; 
> \xA5; \x29 end)");
> 
> Or equivalently:
> 
>   dtxt = pDoc->createTextNode( X(" start ' < >  & ( \xA4; 
> \xA5; ) end)");
> 
> Note that none of this is specific to Xerces.  Any XML 
> processor that conforms to the specifications (available at 
> www.w3.org) must behave this way.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: R: R: R: using non standard character with zerces

Reply via email to