At 03:43 PM 1/14/00 -0800, Leonard Rosenthol wrote:
>At 1:14 PM -0800 1/14/00, Paul Rohr wrote:
>>1.  Character sequence normalization.  (reasonable)
>>---------------------------------------------------
>>Thus, there needs to be work done (probably at input time) to normalize
>>those sequences of combining characters, and perhaps ignore invalid ones.
>
>       If you use the standard OS input methods, they will handle 
>all this for you - in fact, they will also handle a number of other 
>input issues that are pretty complex for some languages (especially 
>CJK).

Of course, we'd love to take advantage of OS-level input methods wherever 
possible, but I'm less confident than you are that these will be sufficient 
in all cases.  

I'm eager to see the code that proves me wrong.  :-)

>>(Otherwise, the variant sequences will make features like spell-check
>>prohibitively unreliable.)
>
>       And also search & replace.   The whole "combined characters" 
>in Unicode issue is an interesting one, especially when doing things 
>like regular expression searches.

Yep.  That's another good reason for normalization.  

>>2.  Combining characters -- position.  (???)
>>--------------------------------------------
>>The current code assumes that every Unicode character will occupy one cell
>>of display space of a known width.  However, languages like Thai render
>>sequences of several characters into the same display cell.
>
>       Since Unicode only has a single code point for any valid 
>glyph, your input handler should be converting the multiple 
>characters into the new composite glyph value and then you only have 
>one character to display.

Some languages may indeed have code points for all the composite glyphs 
needed.  However, as far as I can tell, this is *not* true for Thai. 

  http://charts.unicode.org/Unicode.charts/normal/U0E00.html

As far as I can tell, the following combining characters need to be 
composited with one or more other characters at rendering time:

  0E31
  0E34 - 0E3A
  0E47 - 0E4E

Am I missing something here?  

>>4.  Combining characters -- rendering.  (???, platform-specific)
>>----------------------------------------------------------------
>>On each platform, someone will need to investigate whether the
>>text-rendering primitives know how to properly combine a character sequence
>>into a single glyph.  If so, drawing should be pretty easy.  If not, adding
>>logic to do all that rendering from the constituent glyphs in the font may
>>be difficult. 
>>
>       Again, if you use the single combined glyph code point, it 
>should work just fine when rendered.

Again, this sounds wonderfully convenient, but I'm not sure it's always
true.  

Paul



Reply via email to