https://bugs.documentfoundation.org/show_bug.cgi?id=66791

--- Comment #32 from Jonathan Clark <[email protected]> ---
This bug is due to the greedy algorithm we use to assign script types to
weakly-associated characters. It does not properly handle punctuation.

The current algorithm works something like this:

- First, any weak characters at the start of a paragraph are assigned to the
same script as the first strong character in the paragraph.
- Then, the paragraph is scanned in reading order. Weak characters are assigned
to the previously-seen script, with a few hard-coded exceptions (e.g. bug
112594).
- Finally, we run the Unicode bidi algorithm, and reassign all right-to-left
text to the complex script type.

The last step hides the depth of the problem. The Unicode bidi algorithm
accounts for nested punctuation, so the output seems correct-but-buggy for RTL
languages (while not working at all for other language pairs).


In my opinion, we should replace the current algorithm with one that extends
the RTL behavior to all languages. Existing RTL documents depend on the current
behavior, and impacted CJK documents likely already include manual formatting
to achieve the same effect, so this seems like the least-disruptive option.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to