On Sun, 24 May 2020 17:18:27 +0300 Eli Zaretskii <e...@gnu.org> wrote:
> > Date: Sat, 23 May 2020 21:42:24 +0100 > > From: Richard Wordingham <richard.wording...@ntlworld.com> > > > As for different scripts: if the character codepoints are the > > > same, Emacs currently assigns each character to a single script. > > I'll need to dig deeper. Composition of both 'a' and Greek alpha > > with an acute accent works, which suggest that the problem isn't > > there for characters with a script property of 'inherited'. > Emacs currently leaves it up to HarfBuzz to guess the script, as it > doesn't yet have the necessary smarts. I thought the issue lay within Emacs. HarfBuzz has been fairly civilised about combining marks in the 'wrong' script run. If I put Thai marks in what is basically a Tai Tham script run, it seems to treat them properly. I do such a strange thing because the marks have been borrowed into Tai Tham, but not yet encoded. I was told I couldn't do this in Emacs 24. It seems to me that Emacs knows what script a cluster is in; perhaps it just hasn't united the concepts. Users may have written some weird clustering combinations, and I can imagine some weird combinations in the Private Use Areas. I should investigate. > > The behaviour in 27.05 is the almost the same as for 24.4, but the > > breaking in item (1) is automatically repaired. > > Pressing the 'delete' key still deletes a single character, but may > > be that because it's mapped to tpu-delete-current-char. It's OK, it's still working with emacs -q. That means one can easily replace the initial character of a cluster. > If you press DEL (or Backspace), it will delete a single codepoint. That only deletes the final cluster. > > So, what's not working in Arabic is that one can't move the cursor > > through ligatures. > > That's a feature (you can disable it with disable-point-adjustment). Is this documented in info, or does one have to trawl the code to find out what it does? It seems that Emacs needs several levels of movement - by codepoints, by grapheme cluster, by akshara (will be the same as grapheme cluster in many cases) and by HarfBuzz cluster, or whatever is used to make access into lam-alif impossible. Visible motion by akshara is the minimum requirement for English, so that stepping through 'ffi' will visibly advance the cursor. LibreOffice writer aims to provide visible cursor motion at the grapheme cluster level, so one can use the cursor to step through the consonants in an akshara. By codepoint is useful for editing complex aksharas; it is even more useful if the cursor acts like a cluster terminator, but that is probably a matter of personal taste. It will also be useful for editing narrow phonetic transcriptions, which can be quite heavy on diacritics. By grapheme cluster (at least, by default grapheme cluster) is level encouraged by Unicode, and will give you letter-by-letter control even if you're editing Sanskrit in an Indian script. For Arabic, European and Hebrew scripts, this is the same as akshara level. By akshara is the current default movement level for most Indian scripts in Emacs. It is also the level at which the most Hindi speakers claim to operate. (I get the impression, however, that a lot of Indians do their fine level editing of complicated text in transliteration!) By HarfBuzz cluster takes you to the level where HarfBuzz will easily give you cursor positions. Now occasionally HarfBuzz's actual clusters won't combine whole grapheme clusters or aksharas. For example, Thai vowels could be roughly placed for Thai without taking into account of the previous letters, just as on typewriters, and one can even handle Thai tone marks like that. It's possible that in these cases, HarfBuzz will not form clusters. How you handle these cases is up to you. I would make 'by HarfBuzz cluster' the coarsest. I don't think motion by HarfBuzz cluster is useful - perhaps you know of a use. Richard. _______________________________________________ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz