>> yet another is how one should deal with character-based indexing, for >> instance >> indexing in sam expressions - does /é/-#0+#1 point to the character after >> the unadorned e, or after the whole sequence?
>thair be dragons here. the library of congress has a 100-page manual on >alphebetization >of languages with roman letters. different languages have different rules >(sometimes for the >same codepoint); a language sometimes has different rules for different >codepoints. >then there are ligatures. in german ss and ß are sorted the same. uff. this answer doesn't fit the question. i think base+combiner* should be treated as an indivisible character. but again, if we use cannonical compositions, this case can be avoided except in cases where the character can't be drawn anyway. - erik
