Doug Ewell writes: > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > > > I've tried to experiment a collation algorithm to implement UCA by the > > same system as used in UCD decompositions, but with added (and > > sometimes modified) decompositions. This system creates new "code > > points" needed to represent only <font> compatibility differences, > > ligatures, or alternate forms, as a decomposition of the existing > > compatibility character, into more basic characters exposed with > > primary differences in UCA, plus these new characters given "variable" > > collation weights, which may be ignorable in applications which ignore > > extra levels. This encoding uses a 31 bit code space, which is still > > highly compressible, but still representable with the UTF-8 TES (but > > they are not containing Unicode code points) or similar ad-hoc > > representation. > > Please don't use UTF-8 to encode anything other than Unicode code > points.
As long as I use it internally for intermediate processing, I can do what I want. For now it is just a convenient way to represent variable size integers up to 31 bits (in fact I use it to represent 32 bit signed integers, but the two highest bits are equal). Of course if I still use it to represent something else thzn codepoints in some published data or text, I will rename it and won't keep the same charset label. But it's highly probable that this will not be the most efficient representation (due to its byte-oriented splitting), and a more compact or easier to process serialization could require an alternate encoding scheme (or transfer syntax). __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

