Just a reminder that the statement of the problem has not been agreed to. I don't see a vowel sequence in Yerushala(y)im.
Jony > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Peter Kirk > Sent: Tuesday, July 08, 2003 3:19 PM > To: [EMAIL PROTECTED] > Subject: SPAM: Re: Yerushala(y)im - or Biblical Hebrew > > > On 08/07/2003 02:23, Peter Kirk wrote: > > > > > Would it work to define a new character, for example, for > patah-hiriq > > which has a canonical decomposition into patah plus hiriq, or even > > into hiriq plus patah? Would normalisation compose a patah-hiriq > > sequence into this character and so get round the > reordering problem? > > Remember that the reverse sequence is actually not > attested, as far as > > I can tell for any of the sequences in question. > > > A couple of off list comments have made it clear to me that this > proposal needs some clarification and adjustment. But I think it can > still be made to work. It is a nasty kludge, but then as > someone pointed > out any solution to this problem is bound to be a nasty > kludge. In some > ways it is less nasty than others that have been suggested, and it > doesn't have some of the disadvantages that have been > mentioned. It also > has the advantage that no recoding of existing text is required. That > doesn't make it my preferred solution (the CGJ solution is > still that), > but it is at least worth considering. > > This solution requires adding a new character for each vowel sequence > found in Hebrew texts. Currently six such sequences have been > identified > in the WTS Bible text - though one of these (sheva-hiriq) is > already in > canonical order and so not a problem. So this is fewer new characters > than the earlier proposal - but there may be other sequences in other > texts. This relies on the fact that none of these sequences > are found in > reverse, although we cannot guarantee that this is true for > all texts. I > will use the patah-hiriq sequence as an example, all other sequences > solved separately in the same way. > > The solution for this sequence is as follows: Define a new combining > character something like HEBREW LIGATURE PATAH HIRIQ with a canonical > decomposition of hiriq - patah (yes, that way round) and a > glyph with a > hiriq to the left of a patah. How does this help? Well, it will not > affect users who type patah then hiriq, in non-canonical > order, into an > application which does not immediately normalise the text, as the > renderer will still render hiriq to left of patah as typed. But when > this text is normalised into NFC, the sequence will first be > reordered > as hiriq - patah, and then this combination will be composed into the > new ligature. That is correct, isn't it? So an application > which renders > the NFC text will see the new character and should render it > according > to its glyph. In NFD text, the hiriq - patah sequence remains, but it > is, I think, customary if not required for the renderer to > combine the > glyphs into the defined ligature before rendering. So in > every case the > end user sees hiriq to the left of patah, although in fact the > underlying encoding is reversed. > > Have I missed anything vital here? I know that more study may > be needed > of interaction with cantillation marks, some of which can > appear between > the patah and the hiriq. > > Of course we could simply store the reversed order without > defining a > new character. But renderers would then need clear > instruction somewhere > in the Unicode text that, as an exception to the normal rules for > rendering multiple diacritics, the hiriq should be positioned to the > left of the patah and similarly for the other attested sequences. > > -- > Peter Kirk > [EMAIL PROTECTED] > http://web.onetel.net.uk/~peterkirk/ > > > >

