On Thursday 09 March 2006 12:16, David Bertoni wrote: > Steven T. Hatton wrote:
> > wchar_t is 32 bits on my system. I believe that a 16 bit storage unit > > will under normal circumstances occupy a 32 bit memory location, but only > > use half of it. > > Yes, and don't you think that's rather wasteful? Would you use Xerces-C > to process large XML documents if you knew it was wasting half of its > character string storage just so it could use wchar_t on all platforms? Actually, I did not state my intended meaning well, and I have now come to understand that I was in error. I was thinking in terms of individual units of storage, i.e., individual characters as opposed to containers. Containers (at least sequential containers) are basically arrays under the hood, so they do store data contiguously. I believe an individual 16-bit XMLCh will occupy 32-bits of storage, but that is probably a fairly rare animal, and therefore not worth consideration. > > Why does Xerces-C use a non-standard data type? > > unsigned short is not a non-standard type. You may think it's > "non-standard" for holding character data, but Xerces-C encodes > character data in UTF-16 code units, and that requires a 16-bit integral > type. It is (AFAIK) not one of the datatypes supported by my Standard Library implementation. That is my point. I cannot seamlessly use it with the facilities provided by the C++ Standard Library. > > If my implementation doesn't support a particular locale, and > > > > therefore does not use a 16 bit or larger data type, then what are the > > chances that I would use Xerces-C to support such a character set? > > You've got it backwards -- Xerces-C only support the current locale's > character set in a very limited fashion, by providing a way to transcode > UTF-16 strings to character strings in the current locale. Otherwise, > it operates internally exclusively in UTF-16, and it is unaffected by > the current locale or how the system encodes char or wchar_t. According to the standard, the C++ implementation must use a wchar_t large enough to hold all the characters used by that local. Combining that requirement with the requirement that implementation needs to support the character literals of the extended character set using the naming specified by ISO/IEC 10646:2000, I conclude that the requirement is virtually identical to the requirement that it support UTF. But I won't go so far as to say UTF-16. Steven --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
