RE: R: R: R: using non standard character with zerces

Jesse Pelton Mon, 19 Sep 2005 10:56:08 -0700

You'd use the XMLCh array (xmlStr in my example) in your calls to, for example, 
createTextNode().  It's just a cumbersome but portable way to create a string 
of characters in Xerces' internal format.  Xerces uses the standardized UTF-16 
encoding to represent characters internally, so XMLCh is required to be (at 
least) a 2-byte (16-bit) type.  Some compilers (like Microsoft's) have a native 
string type that is an exact match.  With such a compiler, this:


   XMLCh* xmlStr = L"(\xA5)";

is equivalent (for our purposes) to:

   XMLCh xmlStr[] = { '(', 0xA5, ')', chNull };

So, with either xmlStr, you could make a call like:

  dtxt = pDoc->createTextNode(xmlStr);

This would create a text node with a parenthesized yen symbol, which you could 
then insert into the document.

If your compiler does not have an internal string format that matches Xerces', 
XMLCh is typically defined as an unsigned short on the assumption that it will 
be a two-byte type.  (This is the case for GCC.  wchar_t could in theory be 
used, but it's a four-byte type, which is wasteful for most documents.)  
There's no string notation for integral types, hence the necessity to use the 
cumbersome array notation to create your XMLCh strings.

Your modifications to XStr do not look safe to me.  You appear to be assuming 
that simple copies from one form to another will suffice, which effectively 
removes the transcoding that is the primary purpose of the class.  You can get 
away with this sometimes, but it won't work in the general case.  The fact that 
defeating the transcoding makes things appear to work lends support to 
Alberto's hypothesis that your current local code page is unable to represent 
one or more of the characters that you want to transcode.  Consequently, the 
transcoding fails with the original XStr.

I'd avoid transcoding altogether unless you know precisely why you're doing it 
and what will happen.  Specifically, the XStr class is too simple-minded to 
handle the text you're giving it.  It happens to work for the ASCII text in the 
sample apps, but it's not really general.  If you want to continue to use it, 
you should probably enhance it to transcode to UTF-16 rather than the local 
code page, since UTF-16 is what Xerces is expecting.

I'd strongly recommend using XMLCh arrays for any literal strings instead.

Note that I could be wrong about some of this; I trust that Alberto or someone 
else will point out any errors.

> -----Original Message-----
> From: AESYS S.p.A. [Enzo Arlati] [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 19, 2005 11:54 AM
> To: [email protected]
> Subject: R: R: R: R: using non standard character with zerces
> 
> 
> At the beginning of times the macro X and S, simply call a 
> class which allocate and , more important, free the memory 
> allocate by the translate call, nothing more, nothing less, 
> than the way show by a lot of examples or tutorial about xerces
> Now I added more core to manage the character wich require 
> more than 7 bit, a strangely anough , it seems to work.
> I supposed that already exist a better way to do that so I 
> post my revision of this class, so, mybe , it should be 
> improved , or some suggestions will come about that.
> 
> BTW I didn't understood where and how use the XMLCh array ( 
> XMLCh xmlStr[] = { '(', 0xA5, ')', chNull };). please can you 
> provide me some example if you have someone ?
> 
> #ifndef UTIL_XERCES_H
> #define UTIL_XERCES_H
> 
> 
> #include <string>
> #include <xercesc/util/PlatformUtils.hpp>
> //#include <xercesc/util/XMLString.hpp>
> #include <xercesc/dom/DOM.hpp>
> #include <xercesc/dom/DOMNode.hpp>
> 
> #include <stdio.h>
> 
> 
> XERCES_CPP_NAMESPACE_USE
> 
> class util_xerces
> {
> 
> public:
>     util_xerces();
>     ~util_xerces();
> 
>     static int compareXS(char * str, const XMLCh * xml);
>     static DOMNode* getChildByName(DOMNode *node, char *name);
> 
> };
> 
> 
> // §§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§
> 
> class XStr
> {
> public :
>     // 
> --------------------------------------------------------------
> ---------
>     //  Constructors and Destructor
>     // 
> --------------------------------------------------------------
> ---------
>     XStr(const char* const toTranscode)
>     {
>         XMLCh  c;
>         int idx;
>         fChar = 0;
>         fUnicodeForm = 0;
> 
>         if( toTranscode )
>         {
>            //fUnicodeForm = XMLString::transcode(toTranscode);
>            fUnicodeForm =  XMLString::transcode( "" );
>            int ilen = strlen( toTranscode );
>            fUnicodeForm = XMLString::transcode( string(ilen, 
> ' ').c_str() );
> 
>            for( idx = 0 ; idx < ilen; idx++ )
>            {
>                c = toTranscode[idx];
>                fUnicodeForm[idx] = c;
>            }
>         }
>     }
>     // 
> --------------------------------------------------------------
> ---------
>     XStr(const XMLCh* const toTranscode)
>     {
>         unsigned char c;
> 
>         fChar = 0;
>         fUnicodeForm = 0;
> 
>         if( toTranscode )
>         {
>            // fChar = XMLString::transcode(toTranscode);
>            // fString =  string( fChar );
> 
>            int ilen = XMLString::stringLen( toTranscode );
>            fString = "";
>            for( int idx = 0 ; idx < ilen; idx++ )
>            {
>                c = ( unsigned char ) toTranscode[idx];
>                fString += string( 1, c );
>            }
>         }
>     }
>     // 
> --------------------------------------------------------------
> ---------
>     ~XStr()
>     {
>         if( fUnicodeForm ) XMLString::release(&fUnicodeForm);
>         if( fChar )        XMLString::release(&fChar);
>     }
> 
> 
>     // 
> --------------------------------------------------------------
> ---------
>     //  Getter methods
>     // 
> --------------------------------------------------------------
> ---------
>     const XMLCh* unicodeForm() const
>     {
>         return fUnicodeForm;
>     }
>     // 
> --------------------------------------------------------------
> ---------
>     string toString() const
>     {
>         return fString;
>     }
> 
> private :
>     // 
> --------------------------------------------------------------
> ---------
>     //  Private data members
>     //
>     //  fUnicodeForm
>     //      This is the Unicode XMLCh format of the string.
>     // 
> --------------------------------------------------------------
> ---------
>     XMLCh*   fUnicodeForm;
>     char *   fChar;
>     string   fString;
> };
> 
> #define X(str) XStr(str).unicodeForm()
> #define S(str) XStr(str).toString()
> 
> #endif
> 
> 
> -----Messaggio originale-----
> Da: Jesse Pelton [mailto:[EMAIL PROTECTED]
> Inviato: lunedì 19 settembre 2005 16.45
> A: [email protected]; [EMAIL PROTECTED]
> Oggetto: RE: R: R: R: using non standard character with zerces
> 
> 
> I should know better than to just ape other people's code 
> without understanding it.  What does the X() macro or function do?
> 
> It's starting to sound like the problem is your compiler's 
> wide character support (if any).  Does your compiler have 
> support for strings of characters with more than 7 bits?  If 
> not, you'll probably have to create XMLCh arrays rather than 
> native strings.  If you put the following XMLCh string into 
> your DOM, you should get a parenthesized yen symbol in the output:
> 
>   XMLCh xmlStr[] = { '(', 0xA5, ')', chNull };
> 
> If you need cross-platform portability, this is definitely 
> the way to go.  If you look in XMLUni.cpp, you'll see dozens 
> of strings defined this way precisely because compiler 
> support for wide character strings is quite variable.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: R: R: R: using non standard character with zerces

Reply via email to