Has any one done worst case scenarios on SCSU, with respect to other methods of encoding Unicode characters?
The numbers I've got are: UTF-32: Since all characters (including any necessary state changes) can be encoded in four characters, and four characters would be necessary for a supplementary character outside any current window, the worst case scenario (for short strings) is an optimal SCSU length = the UTF-32 length. But in the long run, we must account for the windows. As an optimal sequence will probably look like SQX foo bar baz SQX foo bar baz SCn byte SQX foo bar baz . . . SCSU length = UTF-32 length * % of astral characters not in able to be covered by 7 windows + UTF-32 length * 2/4 * % of astral characters covered by 7 windows + 2 bytes * 7 windows (to initially set up the windows) = UTF-32 length * 8185/8192 + UTF-32 length * 7/16384 + 14 = UTF-32 length * 16377/16384 + 14 (actually, min of this and UTF-32 length.) UTF-16: This time, our worst case scenario is certain private use characters. Since certain private use characters take up 3 bytes (when encoded window-less) instead of two in UTF-16, preliminary guess is 3/2 the size of UTF-16. It's suseptible to the same problem as above, only worse. Encoding all characters in as either SDn window byte, SQU high low, or SCn byte, and using the reasoning above gets us = UTF-16 length * 3/2 * 61/62 + UTF-16 length * 1/62 + 16 (This may be somewhat weak, as increasing the ration of private use characters makes windows more useful, and decreasing it makes Unicode mode more useful.) UTF-8: Worst case scenario is a series of NULs (or similar characters). Since this gives us a string with twice the length of the corresponding UTF-8 string, it can't be windowized, and there's no other characters that have much if any expansion, I'd say the worst case scenario is 2 * the UTF-8 length. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I saw a daemon stare into my face, and an angel touch my breast; each one softly calls my name . . . the daemon scares me less." - "Disciple", Stuart Davis

