On 07/07/2003 19:23, John Hudson wrote:

At 08:51 07/07/2003, Ted Hopp wrote:


Editing would also be an
"interesting" experience. Could one search for lamed-patah and find it as
part of lamed-<patah+hiriq>? Or would the proposal be to use these new codes
only as part of bookend processing around normalization (i.e., automatically
recognize the sequences and substitute, normalize, and then automatically
substitute back)?


I suppose the latter is feasible. I am very keen that *any* solution should be invisible to the user.

Would it work to define a new character, for example, for patah-hiriq which has a canonical decomposition into patah plus hiriq, or even into hiriq plus patah? Would normalisation compose a patah-hiriq sequence into this character and so get round the reordering problem? Remember that the reverse sequence is actually not attested, as far as I can tell for any of the sequences in question.



I think we need to keep Peter Constable's point in mind that current usage
should not define the limits of Unicode functionality. Since the principle
is that all sequences of character codes are permitted (2.10), it seems
wrong to supply a fix for only "the small number of attested sequences".

But I agree here. The kind of solution I have just proposed is in danger of escalating in the way in which the number of Latin characters escalated until a decision was made not to add any more.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to