Re: How do I use Xerces strings?

Steven T. Hatton Thu, 09 Mar 2006 10:25:13 -0800

On Thursday 09 March 2006 12:16, David Bertoni wrote:
> Steven T. Hatton wrote:


> > wchar_t is 32 bits on my system.  I believe that a 16 bit storage unit
> > will under normal circumstances occupy a 32 bit memory location, but only
> > use half of it.
>
> Yes, and don't you think that's rather wasteful?  Would you use Xerces-C
> to process large XML documents if you knew it was wasting half of its
> character string storage just so it could use wchar_t on all platforms?

Actually, I did not state my intended meaning well, and I have now come to 
understand that I was in error.  I was thinking in terms of individual units 
of storage, i.e., individual characters as opposed to containers.  Containers 
(at least sequential containers) are basically arrays under the hood, so they 
do store data contiguously.  I believe an individual 16-bit XMLCh will occupy 
32-bits of storage, but that is probably a fairly rare animal, and therefore 
not worth consideration. 

> > Why does Xerces-C use a non-standard data type?
>
> unsigned short is not a non-standard type.  You may think it's
> "non-standard" for holding character data, but Xerces-C encodes
> character data in UTF-16 code units, and that requires a 16-bit integral
> type.

It is (AFAIK) not one of the datatypes supported by my Standard Library 
implementation.  That is my point.  I cannot seamlessly use it with the 
facilities provided by the C++ Standard Library.

> > If my implementation doesn't support a particular locale, and
> >
>  > therefore does not use a 16 bit or larger data type, then what are the
>  > chances that I would use Xerces-C to support such a character set?
>
> You've got it backwards -- Xerces-C only support the current locale's
> character set in a very limited fashion, by providing a way to transcode
> UTF-16 strings to character strings in the current locale.  Otherwise,
> it operates internally exclusively in UTF-16, and it is unaffected by
> the current locale or how the system encodes char or wchar_t.

According to the standard, the C++ implementation must use a wchar_t large 
enough to hold all the characters used by that local. Combining that 
requirement with the requirement that implementation needs to support the 
character literals of the extended character set using the naming specified 
by ISO/IEC 10646:2000, I conclude that the requirement is virtually identical 
to the requirement that it support UTF.  But I won't go so far as to say 
UTF-16.

Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How do I use Xerces strings?

Reply via email to