Re: How do I use Xerces strings?

Alberto Massari Thu, 09 Mar 2006 00:35:44 -0800

At 03:19 AM 3/9/2006 -0500, Steven T. Hatton wrote:

On Thursday 09 March 2006 02:08, David Bertoni wrote:


> What does this simply program output on your system and compiler?

wchar_t is 32 bits on my system.  I believe that a 16 bit storage unit will
under normal circumstances occupy a 32 bit memory location, but only use half
of it.


Have a look at http://issues.apache.org/jira/browse/XERCESC-1305

This is what Sun says wrt wchar_t: "Solaris uses Unicode (UTF-32) atwchar_t only if the current locale is Unicode/UTF-8 locale. In anyother locale, the wchar_t is not in UTF-32. This is due to the factthat wchar_t is an opaque data type in POSIX and we have beensupporting wchar_t long before Unicode in our systems. MSFT Windowsdeclared that their wchar_t is Unicode when they created Windows NTas long as you define the _UNICODE macro in your VB/VC++ programs andthat makes people think all wchar_t, regardless of platforms useUnicode but that's not really true."


In the end:
- XML requires that a parser manipulate Unicode

- DOM defines that the strings being stored in the tree are UTF-16(that encodes the entire Unicode sets)hence, Xerces uses a 16-bit datatype to store UTF-16, and cannotreuse wchar_t because Unix systems store different stuff in those 16/32 bits.

I would also say that your test worked only because you tested thestandard ASCII characters, that all the character sets place at thesame index (<127); once you start testing extended ASCII or Japanesecharacters you will probably see different behaviours.

> I'm not sure anyone has ever said that Xerces-C and Linux wchar_t "don't
> get along."  The problem is that Xerces-C encodes string data in UTF-16
> internally, and using wchar_t to hold UTF-16 code points is not portable.

Why does Xerces-C use a non-standard data type?  If my implementation doesn't
support a particular locale, and therefore does not use a 16 bit or larger
data type, then what are the chances that I would use Xerces-C to support
such a character set?

You don't use Xerces to support a character set; you use Xerces toparse an XML file that potentially holds data in foreign language andis hence encoded in an encoding different from yours; the fact thatyou will not be able to print that content on the console should notlimit your capability of processing it.


Alberto

Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How do I use Xerces strings?

Reply via email to