Re: Yerushala(y)im - or Biblical Hebrew

Peter Kirk Wed, 23 Jul 2003 03:21:06 -0700

On 22/07/2003 20:34, John Hudson wrote:

At 06:00 PM 7/22/2003, Rick McGowan wrote:
A solution with CGJ has been proposed, which is very general and can be
applied to this and other such situations.
I get the impression that CGJ support is not very high on the list of things going to be implemented any time soon by the application developers that matter to us. I'm not saying this is right, only that it raises practical concerns about recommending this solution. Other control characters that have been around longer may not pose this problem, but may still require updates to existing Hebrew engines. I'm currently trying to figure out what works and what does not in the existing implementations. We're already recommending ZWNJ to inhibit meteg +hataf vowel ligation, but this has problems because the control character breaks the mark positioning lookups. I've yet to determine whether this is a fault in the font lookups, the shaping engine, particular apps or text services, or something fundamental to the architecture.

John Hudson
Tiro Typeworks        www.tiro.com
Vancouver, BC        [EMAIL PROTECTED]

I hope you are not suggesting that any application developers are prepared to implement changes to support proposals which they have put forward to the UTC but are not prepared to implement changes to support alternative fixes to the same problems which may be preferred by the UTC because they are acceptable to users. Well, this would be an acceptable position if the alternative fix is much harder to implement than the preferred proposal. But in this case the alternative fix, using CGJ, seems to be actually a very trivial matter for a rendering engine. All it needs to do is to delete from its input stream any CGJ character before it attempts any positioning - but not before doing any normalisation. Of course this doesn't mean that any particular rendering engine can currently be programmed to do this.

In fact it seems to me that the biblical Hebrew rendering problems which I have heard about (on various lists and privately) could be solved easily by introducing a simple pre-processing pass into the rendering engine. (But this is not a fix to the Yerushala(y)im problem or the meteg ordering problem.) This pre-processing pass should sort any combination of base letter and following combining marks into an order which is efficient for the rendering engine, not necessarily the Unicode canonical order, for example according to the "custom combining classes" of ftp://publisher.libronix.com/drop/Tiro/SBLHebrew-Distribution/SBLHebrew-Manual.pdf. It should also delete characters which are not actually to be rendered e.g. CGJ. This pass would also satisfy the preference of Unicode conformance requirement C9 in http://www.unicode.org/book/preview/ch03.pdf: "Ideally, an implementation would always interpret two canonical-equivalent character sequences identically." As in any practical case this is a sort of no more than four or five combining characters according to fixed classes, it can be performed very quickly if programmed into the rendering engine at a binary level (though not necessarily if attempted in the rendering engine's high level language which is not designed for this), especially as short cuts e.g. hash tables can be used for commonly encountered input orderings, including the Unicode canonical ordering.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/

Re: Yerushala(y)im - or Biblical Hebrew

Reply via email to