In article <e1or8lz-0004if...@fencepost.gnu.org>, Eli Zaretskii <e...@gnu.org> writes:
> Where can I find the code which decides how to break text into > LGSTRINGs? I'd like to see such code for both Arabic and Hebrew, > unless it's the same code. A not-yet-shaped LGSTRING is created by autocmp_chars (composite.c) from a character sequence matching with a regular expression PATTERN stored in a composition-function-table. This pattern is "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), and a more complicated regex for Hebrew (lisp/language/hebrew.el). > For example, can characters like digits or other neutrals be included > in the same LGSTRING with Arabic and Hebrew? Or will an LGSTRING > always include characters from one script only? LGSTRING always includes characters of the same font. So, even if you wrote PATTERN to include the other neutrals, if a user's font setting (or environment) decides to user a different font for those neutrals, they are not included in LGSTRING. By default, Emacs tries to use the same font for characters in the same script. In addition, even if you setup fonts to use the same font for, for instance, Hebrew and those neutrals, "shape" method of a font-backend may not support them. In that case, the composition fails anyway. > I'm asking because it's possible that we will need to modify > w32uniscribe.c to reorder R2L characters before we pass them to the > Uniscribe ScriptShape API, to let it see the characters in the logical > order it expects them. That's if it turns out that Uniscribe cannot > otherwise shape them correctly. ??? Currently characters and glyphs in LGSTRING are always in logical order. A "shape" method should also shape that LGSTRING in logical order. --- Kenichi Handa ha...@m17n.org _______________________________________________ emacs-bidi mailing list emacs-bidi@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-bidi