In article <e1orapf-0000gn...@fencepost.gnu.org>, Eli Zaretskii <e...@gnu.org> writes:
> > A not-yet-shaped LGSTRING is created by autocmp_chars > > (composite.c) from a character sequence matching with a > > regular expression PATTERN stored in a > > composition-function-table. This pattern is > > "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), > > and a more complicated regex for Hebrew > > (lisp/language/hebrew.el). > Thanks. So character compositions are used not only to compose > several characters into one glyph, but also to break text into > individually shaped chunks, is that right? Yes. > If so, auto-composition-mode cannot be turned off for scripts that > need this kind of "grouped shaping" without degrading the presentation > of these scripts to the point of illegibility? Yes. And auto-composition-mode cannot be turned off for any scripts that it is not enough to display glyphs corresponding to characters; they are all Indics, some East Asians, Arabic, Hebrew, etc. In this respect, Ababic is not special. Even for some Indics, LGSTRING may contain multibyte grapheme clusters. > > > I'm asking because it's possible that we will need to modify > > > w32uniscribe.c to reorder R2L characters before we pass them to the > > > Uniscribe ScriptShape API, to let it see the characters in the logical > > > order it expects them. That's if it turns out that Uniscribe cannot > > > otherwise shape them correctly. > > > > ??? Currently characters and glyphs in LGSTRING are always > > in logical order. > See my mail from yesterday, where I describe that I see in GDB that > Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual > order: > http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html In this mail, you wrote: > Also, it looks like uniscribe_shape is repeatedly called from > font-shape-gstring to shape the same text that is progressively > shortened. For example, the first call will be with a 7-character > string whose contents is > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} and this character sequence is surely in logical order. So I don't know why you think uniscribe_shape is given a LGSTRING of visual order. > The next call is with a 6-character string whose contents is > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627} > then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc. > Note that the first 7-character string is the first word of the Arabic > greeting, properly bidi-reordered for display. > Are these series of calls expected? No. I don't know why that happens on Windows. On Ubuntu, when I visit a file that contains only these lines: ------------------------------------------------------------ Arabic السّلام ;;; Local Variables: ;;; bidi-display-reordering: t ;;; End: ------------------------------------------------------------ font-shape-gstring is called just once. As the lgstring is getting shorter each time, it seems that composition fails each time. autocmp_chars is mainly called from composition_reseat_it. Could you please trace the code after the first call of autocmp_chars, and find why Emacs descides that a composition fails. --- Kenichi Handa ha...@m17n.org _______________________________________________ emacs-bidi mailing list emacs-bidi@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-bidi