Re: Yerushala(y)im - or Biblical Hebrew

Peter Kirk Tue, 08 Jul 2003 07:06:48 -0700

On 08/07/2003 02:23, Peter Kirk wrote:

Would it work to define a new character, for example, for patah-hiriq which has a canonical decomposition into patah plus hiriq, or even into hiriq plus patah? Would normalisation compose a patah-hiriq sequence into this character and so get round the reordering problem? Remember that the reverse sequence is actually not attested, as far as I can tell for any of the sequences in question.

A couple of off list comments have made it clear to me that this proposal needs some clarification and adjustment. But I think it can still be made to work. It is a nasty kludge, but then as someone pointed out any solution to this problem is bound to be a nasty kludge. In some ways it is less nasty than others that have been suggested, and it doesn't have some of the disadvantages that have been mentioned. It also has the advantage that no recoding of existing text is required. That doesn't make it my preferred solution (the CGJ solution is still that), but it is at least worth considering.

This solution requires adding a new character for each vowel sequence found in Hebrew texts. Currently six such sequences have been identified in the WTS Bible text - though one of these (sheva-hiriq) is already in canonical order and so not a problem. So this is fewer new characters than the earlier proposal - but there may be other sequences in other texts. This relies on the fact that none of these sequences are found in reverse, although we cannot guarantee that this is true for all texts. I will use the patah-hiriq sequence as an example, all other sequences solved separately in the same way.

The solution for this sequence is as follows: Define a new combining character something like HEBREW LIGATURE PATAH HIRIQ with a canonical decomposition of hiriq - patah (yes, that way round) and a glyph with a hiriq to the left of a patah. How does this help? Well, it will not affect users who type patah then hiriq, in non-canonical order, into an application which does not immediately normalise the text, as the renderer will still render hiriq to left of patah as typed. But when this text is normalised into NFC, the sequence will first be reordered as hiriq - patah, and then this combination will be composed into the new ligature. That is correct, isn't it? So an application which renders the NFC text will see the new character and should render it according to its glyph. In NFD text, the hiriq - patah sequence remains, but it is, I think, customary if not required for the renderer to combine the glyphs into the defined ligature before rendering. So in every case the end user sees hiriq to the left of patah, although in fact the underlying encoding is reversed.

Have I missed anything vital here? I know that more study may be needed of interaction with cantillation marks, some of which can appear between the patah and the hiriq.

Of course we could simply store the reversed order without defining a new character. But renderers would then need clear instruction somewhere in the Unicode text that, as an exception to the normal rules for rendering multiple diacritics, the hiriq should be positioned to the left of the patah and similarly for the other attested sequences.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Yerushala(y)im - or Biblical Hebrew

Reply via email to