Re: Yerushala(y)im - or Biblical Hebrew

Peter_Constable Tue, 08 Jul 2003 10:44:11 -0700

Peter Kirk wrote on 07/08/2003 08:18:33 AM:

> A couple of off list comments have made it clear to me that this 
> proposal needs some clarification and adjustment...


> The solution for this sequence is as follows: Define a new combining 
> character something like HEBREW LIGATURE PATAH HIRIQ with a canonical 
> decomposition of hiriq - patah (yes, that way round) and a glyph with a 
> hiriq to the left of a patah... But when 
> this text is normalised into NFC, the sequence will first be reordered 
> as hiriq - patah, and then this combination will be composed into the 
> new ligature. That is correct, isn't it?

Yes, but I wouldn't call it a ligature; I'd call it a precomposed or 
digraph character (and the glyph, I'd call a composite).

> So an application which renders 
> the NFC text will see the new character and should render it according 
> to its glyph. In NFD text, the hiriq - patah sequence remains, but it 
> is, I think, customary if not required for the renderer to combine the 
> glyphs into the defined ligature before rendering.

I'm not aware of anything that presently requires a renderer to combine 
the characters into a composite glyph, or to present the sequence of 
characters < hiriq, patah > with the hiriq to the left of the patah -- 
remember, the description of Hebrew currently in Unicode assumes that such 
sequences don't occur. 

But, in order for your solution to work, this rendering would *have* to be 
required. The fixed position classes would have to be understood as fixed 
relative positions; i.e. given this combination of marks, they are always 
positioned relative to one another in a fixed way, regardless of their 
encoded order. This would assume that any other positioning will never 
occur or be required -- true for cases that we know of, but it is possible 
that there are cases we do not know of, and that such a user need could 
exist in the future. You also haven't said anything about how to deal with 
accents that occur between the two vowel marks (though you did notice the 
issue), and the alternative of that same accent occuring either to the 
left or to the right of the pair of vowel marks (which offhand seems a 
likely potentiality with at least meteg -- I can't check that now since 
I'm away from the office); and these would have to be dealt with as well.

Also, if the rendering of the sequence < hiriq, patah > is required to 
have hiriq to the left of the patah, then what's the point of having the 
additional digraph character? None that I can see. So, a simpler solution 
would simply to specify the relative ordering of certain combinations of 
vowel marks, regardless of the order in which they are encoded. But we'd 
still have the other issues I mentioned in the preceding paragraph.


It is occuring to me that perhaps there is a way to address the stability 
issues that are a concern for IETF while fixing the combining classes for 
other purposes. I need to think about that some more, but that is seeming 
to me like (if the details can be worked out) the best hope for finding a 
solution without having a bunch of "Yeah, but..."s to deal with.


> Of  course we could simply store the reversed order without defining a 
> new character. But renderers would then need clear instruction somewhere 

> in the Unicode text that, as an exception to the normal rules for 
> rendering multiple diacritics, the hiriq should be positioned to the 
> left of the patah and similarly for the other attested sequences.

As mentioned above, this would be necessary anyway for your solution to 
work.



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

Reply via email to