Alright, I've spent some time considering the char_traits issue, and it
seems like some of the methods are positively unfriendly to multi-unit per
glyph encodings, for example:
static void
assign(char_type& __c1, const char_type& __c2)
{ __c1 = __c2; }
static bool
eq(const char_type& __c1, const char_type& __c2)
{ return __c1 == __c2; }
static bool
lt(const char_type& __c1, const char_type& __c2)
{ return __c1 < __c2; }
What happens when these get called on halves of surrogates? I cannot
necessarily determine the less than / equal / etc. state of the characters
based on only one unit. If the methods got called on an array, alright, then
use a switch statement to see if I'm a surrogate, if so, which half, etc.
and then take care of it. But I only have one unit to work with, this can't
happen.
Here are some solutions, in worst to best order:
1) Ignore the surrogates. Goes against the whole point of even using
Unicode. I won't do it.
2) Use 32-bit codepoints. Very wasteful of resources (esp. RAM & I/O). I'll
do it and be ashamed.
3) Simulate 32-bit codepoints. Tell everybody outside that stuff is 32-bit
but in reality on the inside we cast together or bond surrogates in UTF-16
byte sequences to get the 32-bit codepoints. The code would be hard to
maintain. Casts are evil.
4) Forget about actually using the real string container. Just use an
overloaded vector container that has all of the string container's methods.
As long as we maintain the interface everybody wants, who cares how it gets
done inside as long as it's fast. Isn't that how OO should be?
Any comments? I think an overloaded vector that has the same interface as
string would allow for the features we want without forcing us to comply
with string's model of one memory location per glyph.
Regards,
Matt
___________________________________________________________________
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. Thank you.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]