Ihor Radchenko <[email protected]> 2025/12/20: > > Huang Jing <[email protected]> writes: > > > CC'd include-yy who has helped testing out the patch so he can join > > the discussion more easily. > > > > So basically after the patch (HTML) export works fine, but the display > > in Emacs Org buffer has some issues (a regression?). > > > > | 中文 +English+ 中文 vs 中文+English+中文 > > | ^ with space ^ without space > > > > The first version without spaces will be displayed correctly (with the > > crossover), while the second one with space inserted between (which > > used to work before) does not get displayed correctly [1]. > > This is to be expected. The patch was very rough. > I now updated to a more proper patch, adding full-fledged Unicode > support to the markup: "breakable" characters like Chinese, generic > Unicode categories for opening/closing punctuation, generic Unicode > dashes, etc are allowed with the attached patch.
Hi Ihor, I’ve applied your patch and noticed a potential improvement. In the modification of `org-set-emph-re', the third argument of the concat part, "(\\([%s]\\|$\\)", seems like it could be simplified to "\\(%s\\)". like this: (concat "\\(%s\\)" ;before markers "\\(\\([%%s]\\)\\([^%s]\\|[^%s]%s[^%s]\\)\\3\\)" "\\(%s\\)") ;after markers After testing, I noticed that there are still some issues with markers in Chinese text where there are no spaces. For example: 你好*世界*现在我有*冰淇凌*。 (Hello *world* now I have *ice cream*.) In this case, `世界' and `冰淇凌' are expected to be highlighted, but `现在我有' is also being highlighted as part of the markup. I'm not sure what the test results look like on your end. Of course, if you don't have suitable Chinese fonts installed, I hope the following Japanese test case helps: あ*い*うえ*お* You can observe that everything except for the first character "あ" is being highlighted. Another interesting phenomenon is that after applying the patch, the combination `*+ab+' is rendered in bold style within the Org buffer, even though the asterisk * is not paired. Regards.
