On Tuesday, July 08, 2003 8:21 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> On 08/07/2003 11:10, Philippe Verdy wrote: > > > Admit that your proposal of using a canonical decomposition would > > still cause problems with all Unicode algorithms, and with XML > > processing. > > > > Only a NFKD decomposition would make your proposed "ligature" > > character workable for XML processing and Unicode algorithms, > > including UCA, case mappings, UTF representations, etc... > > This proposal for a compatibility decomposition is a possible > alternative, but it's not my proposal, it's yours. I was deliberately > avoiding anything like this which is not compatible with existing > texts. If canonical decomposition isn't going to work, which I'm > still not 100% sure of if composition is blocked, then I will > withdraw my proposal. I don't see why a new code point allocation would be incompatible if it uses a compatible decomposition instead of a canonical decomposition; that's you who proposed this allocation, but I replied that canonical composition exclusion is blocked for *any* canonically equivalent decompositions of a character, and thus any canonical decomposition of your proposed precombined character would not solve the problem, just complicate it: Suppose your character PATAH-HIRIQ is accepted, and is defined as being canonically equivalent to PATAH-HIRIQ. Then the definition of canonical equivalence with all Unicode algorithm would allow any of these algorithm to decompose it to NFD as a pair of characters PATAH and HIRIQ, which are then immediately reordered, into HIRIQ then PATAH. The canonical exclusion just forbids recombining them together into PATAH-HIRIQ. So it remains the NFC sequence: <consonnant, hiriq, patah> And your proposed character is useless (it becomes a compatibility character, not recommended, exactly similar to the "Greek Dialitika with Tonos"). The only way to solve your problem is to make it only a compatibility decomposition, which is excluded from NFC and NFD decomposition and reordering... This would be, I think, the first accepted combining character with a <compat> decomposition and not a canonical decomposition. In addition, the Unicode stability policy would require that the defined <compat> decomposition be given in canonical order. Llook for example, the many Arabic <compat> decompositions, which could not be made canonical for the simple reason that the Unicode policy pact guarantees that the decompositions will be defined in canonical order, and only include a character pair for canonical decompositions whose second character is not canonically decomposable... -- Philippe.

