Re: Worst case scenarios on SCSU

David Starner Wed, 31 Oct 2001 17:45:03 -0800

On Wed, Oct 31, 2001 at 05:44:19PM -0600, David Starner wrote:
> UTF-16: This time, our worst case scenario is certain private use
> characters. Since certain private use characters take up 3 bytes (when
> encoded window-less) instead of two in UTF-16, preliminary guess is 3/2
> the size of UTF-16. It's suseptible to the same problem as above, only
> worse. Encoding all characters in as either SDn window byte, SQU high
> low, or SCn byte, and using the reasoning above gets us
> = UTF-16 length * 3/2 * 61/62 + UTF-16 length * 1/62 + 16


Sorry, this is all wrong, as I forgot that some characters can not be
put into windows. I find this case problematic, as a series of BMP Han
characters must be encoded in Unicode mode to get 2 bytes per character,
but the private-use characters must be encoded in UTF-16. 

Some tests with the optimal SCSU encoder I'm working on gets results
between 26-28 bytes for 10 randomly chosen characters in the Unicode
mode tag range.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I saw a daemon stare into my face, and an angel touch my breast; each 
one softly calls my name . . . the daemon scares me less."
- "Disciple", Stuart Davis

Re: Worst case scenarios on SCSU

Reply via email to