Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > I've tried to experiment a collation algorithm to implement UCA by the > same system as used in UCD decompositions, but with added (and > sometimes modified) decompositions. This system creates new "code > points" needed to represent only <font> compatibility differences, > ligatures, or alternate forms, as a decomposition of the existing > compatibility character, into more basic characters exposed with > primary differences in UCA, plus these new characters given "variable" > collation weights, which may be ignorable in applications which ignore > extra levels. This encoding uses a 31 bit code space, which is still > highly compressible, but still representable with the UTF-8 TES (but > they are not containing Unicode code points) or similar ad-hoc > representation.
Please don't use UTF-8 to encode anything other than Unicode code points. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

