On Wed, Oct 31, 2001 at 05:44:19PM -0600, David Starner wrote: > UTF-16: This time, our worst case scenario is certain private use > characters. Since certain private use characters take up 3 bytes (when > encoded window-less) instead of two in UTF-16, preliminary guess is 3/2 > the size of UTF-16. It's suseptible to the same problem as above, only > worse. Encoding all characters in as either SDn window byte, SQU high > low, or SCn byte, and using the reasoning above gets us > = UTF-16 length * 3/2 * 61/62 + UTF-16 length * 1/62 + 16
Sorry, this is all wrong, as I forgot that some characters can not be put into windows. I find this case problematic, as a series of BMP Han characters must be encoded in Unicode mode to get 2 bytes per character, but the private-use characters must be encoded in UTF-16. Some tests with the optimal SCSU encoder I'm working on gets results between 26-28 bytes for 10 randomly chosen characters in the Unicode mode tag range. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I saw a daemon stare into my face, and an angel touch my breast; each one softly calls my name . . . the daemon scares me less." - "Disciple", Stuart Davis

