On 06/07/2003 17:22, John Hudson wrote:


Thanks for the thoughtful analysis, Peter. Eli Evans and I have been documenting all of the unique mark sequences in the Michigan-Claremont text and WTS morphology database that are potentially incorrectly re-ordered in Unicode normalisation (I say potentially, because the fixed position combining classes may, by chance, not reorder some combinations of vowels). In addition to the <patah, hiriq> and <qamats, hiriq> double vowel sequences for Yerushala(y)im, the example you cite from Exodes 20:4 involves two vowels with an interposed cantillation mark -- <qamata, etnahta, patah> -- which needs to be renderable both with and without the cantillation. The WTS morphology database also includes a <tsadi, sheva, hiriq> sequence (in 2 Ch 13:14, last word) that is not attested in either BHS or BHL; Peter Constable enquired about this, since it seemed that it might be an error, but the WTS editors assured him that it was intentional. ...


Thank you, John. Last year I did a similar analysis of the WTS database (as released in 1998), well actually just a simple grep for the sequence vowel - zero or more cantillation placeholders (^) - vowel, and found only the 637 examples I mentioned. I missed the 2 Chronicles example, perhaps because I didn't search for sheva followed by a vowel (though I did include the reverse) as :A, :E and :F are the legal WTS encodings of the hatef vowels (F=qamets). I just did that grep, for ":\^*[IOU]", and found only the 2 Ch 13:14 example. Well, I must say that that one looks very like an error in the WTS database, *MAX:ACOC:IRYM should be *MAX:ACOC:RIYM. As this is marked with * as a rendering of the Ketiv, it is odd to give it vowels at all, and very odd to give a unique combination of vowels. But there may be something strange in the actual MS here that I don't know about.



... Given the small number of attested sequences that would be adversely affected by normalisation re-ordering, I'm beginning to favour the idea of encoding these sequences as individual characters. We'd probably only need three or four, plus a right meteg, to solve the problem, and rendering would work find with existing font and layout engine technologies.

This sounds like a sensible alternative.


--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to