Ihor Radchenko <[email protected]> 2025/12/20:
>
> Huang Jing <[email protected]> writes:
>
> > CC'd include-yy who has helped testing out the patch so he can join
> > the discussion more easily.
> >
> > So basically after the patch (HTML) export works fine, but the display
> > in Emacs Org buffer has some issues (a regression?).
> >
> >    | 中文 +English+ 中文 vs 中文+English+中文
> >    |    ^ with space          ^ without space
> >
> > The first version without spaces will be displayed correctly (with the
> > crossover), while the second one with space inserted between (which
> > used to work before) does not get displayed correctly [1].
>
> This is to be expected. The patch was very rough.
> I now updated to a more proper patch, adding full-fledged Unicode
> support to the markup: "breakable" characters like Chinese, generic
> Unicode categories for opening/closing punctuation, generic Unicode
> dashes, etc are allowed with the attached patch.

Hi Ihor,

I’ve applied your patch and noticed a potential improvement. In the
modification of `org-set-emph-re', the third argument of the concat part,
"(\\([%s]\\|$\\)", seems like it could be simplified to "\\(%s\\)". like this:

(concat "\\(%s\\)" ;before markers
  "\\(\\([%%s]\\)\\([^%s]\\|[^%s]%s[^%s]\\)\\3\\)"
  "\\(%s\\)") ;after markers

After testing, I noticed that there are still some issues with markers
in Chinese text where there are no spaces. For example:

你好*世界*现在我有*冰淇凌*。 (Hello *world* now I have *ice cream*.)

In this case, `世界' and `冰淇凌' are expected to be highlighted, but
`现在我有' is also being highlighted as part of the markup. I'm not sure
what the test results look like on your end.

Of course, if you don't have suitable Chinese fonts installed, I hope
the following Japanese test case helps:

あ*い*うえ*お*

You can observe that everything except for the first character "あ" is
being highlighted.

Another interesting phenomenon is that after applying the patch, the
combination `*+ab+' is rendered in bold style within the Org buffer, even
though the asterisk * is not paired.

Regards.

Reply via email to