Re: Yerushala(y)im - or Biblical Hebrew

Philippe Verdy Tue, 08 Jul 2003 13:41:04 -0700

On Tuesday, July 08, 2003 8:21 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:

> On 08/07/2003 11:10, Philippe Verdy wrote:
> 
> > Admit that your proposal of using a canonical decomposition would
> > still cause problems with all Unicode algorithms, and with XML
> > processing.
> > 
> > Only a NFKD decomposition would make your proposed "ligature"
> > character workable for XML processing and Unicode algorithms,
> > including UCA, case mappings, UTF representations, etc...
>
> This proposal for a compatibility decomposition is a possible
> alternative, but it's not my proposal, it's yours. I was deliberately
> avoiding anything like this which is not compatible with existing
> texts. If canonical decomposition isn't going to work, which I'm
> still not 100% sure of if composition is blocked, then I will
> withdraw my proposal. 

I don't see why a new code point allocation would be incompatible
if it uses a compatible decomposition instead of a canonical
decomposition; that's you who proposed this allocation, but I
replied that canonical composition exclusion is blocked for *any*
canonically equivalent decompositions of a character, and thus
any canonical decomposition of your proposed precombined
character would not solve the problem, just complicate it:

Suppose your character PATAH-HIRIQ is accepted, and is
defined as being canonically equivalent to PATAH-HIRIQ.
Then the definition of canonical equivalence with all Unicode
algorithm would allow any of these algorithm to decompose
it to NFD as a pair of characters PATAH and HIRIQ, which
are then immediately reordered, into HIRIQ then PATAH.
The canonical exclusion just forbids recombining them
together into PATAH-HIRIQ.

So it remains the NFC sequence: <consonnant, hiriq, patah>
And your proposed character is useless (it becomes a
compatibility character, not recommended, exactly similar
to the "Greek Dialitika with Tonos").

The only way to solve your problem is to make it only a
compatibility decomposition, which is excluded from NFC
and NFD decomposition and reordering...  This would be,
I think, the first accepted combining character with a
<compat> decomposition and not a canonical decomposition.
In addition, the Unicode stability policy would require that
the defined <compat> decomposition be given in canonical
order.

Llook for example, the many Arabic <compat> decompositions,
which could not be made canonical for the simple reason that
the Unicode policy pact guarantees that the decompositions
will be defined in canonical order, and only include a character
pair for canonical decompositions whose second character is
not canonically decomposable...

-- Philippe.

Re: Yerushala(y)im - or Biblical Hebrew

Reply via email to