On Thursday 09 March 2006 14:14, David Bertoni wrote: > Steven T. Hatton wrote:
> I guess I don't understand what you mean by "I believe an individual > 16-bit XMLCh will occupy 32-bits of storage." How can a 16-bit XMLCh > ever occupy 32 bits of storage? What is the CPU going to stick in the other 16 bits of a 32 bit word when it stores a single XMLCh? > I agree it's a big problem that you cannot use it with > std::basic_string, but there's no reason why you can't use it with the > the other containers. What other facilities do you want to use? Well, I'm still learning the Standard Library, so I don't really know what I can get of the std::basic_string. I know it has a bunch of seaching and manipulation functions. In all likelyhood, I will end up using QString for my UI. I'm working on a C++ project management infrastructure, and felt somewhat compromised by having to rely on Qt. Not that I have anything against Qt. It think it's fantastic. I just wanted to build the basics of the program using Standard C++. > UTF-16 is an encoding of the 10646/Unicode character set, and you've > > stated previously that the C++ standard does not talk about encodings: > > The C++ Standard only specifies character sets. It does not specify > > encodings. > > There is no requirement that a character specified with a universal > character name be encoded in any particular way -- it's just another way > to name a character. There's an isomorphism in there somewhere which, in principle, could be leveraged to bridge between the encodings. I'm not saying it would be worth doing. > My version of the standard also has this to say: > > "If the hexadecimal value for a universal character name is less than > 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal > character name designates a character in the basic source character set, > then the program is ill-formed." > > That restricts the usage of universal character names too severely for > Xerces-C's purposes. I am under the impression that the stipulation you quoted only applies to character literals. AFAIK Xerces-C doesn't support character literal of any kind. Correct? What I really want to know is whether there is significant cost associated with using UTF-16 with support for character sets outside of the BMP. In some operations that would require the program to sniff every character to detect if it is multi-unit. From thingking through scenarios, it seem likely that you could get away with ignoring that aspect of the encoding. Steven --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
