On Fri May 19 09:38:23 CDT 2006, [EMAIL PROTECTED] wrote:
> perhaps there are actually two problems here:
> 1) how to get libdraw to map back from a sequence of combining characters
> to a character in the font that represents that sequence.

this is pretty easy.  the unicode standards provides cannonical compositions.
i think it would be easier for libdraw to insist that string be given strings 
that
have been cannonicaly composed.  perhaps a job for tcs.

> 2) how to draw sequences of combining characters that don't exist in 
> precombined
> form within unicode. it's quite possible that one might wish to provide
> pre-rendered glyphs for some of these sequences - the current font format
> can't deal with that.

the general case doesn't seem like it would yield a solution with a bitmap font.
sure you could put a circumflex on an "a".  but what about dashed letters like
ł?  drawing a dash through an arbitrary character gets to be a real pain.  

the good news is that solving #1 would take care of most problems.  
unfortunately,
some romanized versions of russian and vietnamise (i believe) would still not 
work.
but we would get 80% of what we would like without the pain of trying to treat
a bitmap as if they were vector character descriptions a la metafont.

> 
> another issue is dealing with code (e.g. libframe) that assumes that
> characters do not overstrike - i.e. that there's a 1-1 correspondence
> between Runes and glyphs.

charofpt would be a problem.  there would be some problems with picking a proper
endpoint for highlighting.  a break between the base and the combiners would
be a problem.  i think the largest problem here would be dealing with the 
character
height.  currently in libdraw a character's height is the font's height. this 
isn't true
for many fonts we already have -- ÄÖÜ☺ tend to get clipped with pelm because 
they are 
taller than the font file claims. just expanding the height of the font would 
look pretty 
funny in the absence of taller characters.

> yet another is how one should deal with character-based indexing, for instance
> indexing in sam expressions - does /é/-#0+#1 point to the character after
> the unadorned e, or after the whole sequence?

thair be dragons here.  the library of congress has a 100-page manual on 
alphebetization
of languages with roman letters.  different languages have different rules 
(sometimes for the 
same codepoint); a language sometimes has different rules for different 
codepoints.
then there are ligatures.  in german ss and ß are sorted the same. 

there are probablly only two sensible ways to deal with this.  either strip/do 
not strip
all combiners and do a naive sort or define some sort of locale.

> it'd be nice to sort this issue out properly; surely it shouldn't be
> too hard?

i believe this is another entry for the "famous lies list," ranking somewhat
below "check's in the mail" and above "i have this friend who...."

- erik

Reply via email to