> > ... Given the small number of attested sequences that would be > > adversely affected by normalisation re-ordering, I'm beginning to > > favour the idea of encoding these sequences as individual characters. > > We'd probably only need three or four, plus a right meteg, to solve > > the problem, and rendering would work find with existing font and > > layout engine technologies. > > This sounds like a sensible alternative.
This would make data entry difficult for users. Nobody thinks of these character sequences as single characters.
If, as Ken suggested, it is feasible to use CGJ or another control characters without the user needing to know about it, i.e. as something inserted in the backing string from input in which only the mark characters are entered by the user, then it should be feasible, and probably easier, to hide the use of these precomposed mark combinations.
Editing would also be an "interesting" experience. Could one search for lamed-patah and find it as part of lamed-<patah+hiriq>? Or would the proposal be to use these new codes only as part of bookend processing around normalization (i.e., automatically recognize the sequences and substitute, normalize, and then automatically substitute back)?
I suppose the latter is feasible. I am very keen that *any* solution should be invisible to the user.
I think we need to keep Peter Constable's point in mind that current usage should not define the limits of Unicode functionality. Since the principle is that all sequences of character codes are permitted (2.10), it seems wrong to supply a fix for only "the small number of attested sequences".
This is a concern, but not an overriding one. Yes, all sequences are permitted, and some will be reordered during normalisation. We are currently aware of a small number of attested sequences that definitely should not be reordered. At this stage, I really don't care whether other, unattested Hebrew mark sequences are reordered or not, just as I know there are some sequences that Uniscribe cannot render and some that my fonts cannot render. That said, it is always a possibility that some new sequence will be attested in an as yet undiscovered or unpublished manuscript, which is a legitimate if minor concern.
John Hudson
Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED]
The sight of James Cox from the BBC's World at One,
interviewing Robin Oakley, CNN's man in Europe,
surrounded by a scrum of furiously scribbling print
journalists will stand for some time as the apogee of
media cannibalism.
- Emma Brockes, at the EU summit
