Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

> I've tried to experiment a collation algorithm to implement UCA by the
> same system as used in UCD decompositions, but with added (and
> sometimes modified) decompositions. This system creates new "code
> points" needed to represent only <font> compatibility differences,
> ligatures, or alternate forms, as a decomposition of the existing
> compatibility character, into more basic characters exposed with
> primary differences in UCA, plus these new characters given "variable"
> collation weights, which may be ignorable in applications which ignore
> extra levels. This encoding uses a 31 bit code space, which is still
> highly compressible, but still representable with the UTF-8 TES (but
> they are not containing Unicode code points) or similar ad-hoc
> representation.

Please don't use UTF-8 to encode anything other than Unicode code
points.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/


Reply via email to