Maxim Nikulin <maniku...@gmail.com> writes: > On 29/06/2021 10:47, James Harkins wrote: >> So, it would make sense to add a rule to the exporter: if one of the >> characters before or after a source-text line break is a Chinese, >> Japanese or Korean character, do not add a space. > > On 29/06/2021 11:43, tumashu wrote: >> You can try the below config :-) >> (let ((regexp "[[:multibyte:]]") >> (string text)) >> (setq string >> (replace-regexp-in-string >> (format "\\(%s\\) *\n *\\(%s\\)" regexp regexp) >> "\\1\\2" string)) > > Notice that [[:multibyte:]] means almost any non-ASCII script, e.g. > Cyrillic: > > (let ((sample "abc абв def")) > (and (string-match "[[:multibyte:]]\+" sample) > (match-string 0 sample))) > "абв" > > It seems, `org-fill-paragraph' M-q is smart enough to avoid a space > before or after a CJK character, so it is possible to determine > correct way to splice lines, despite e.g. "Script" Unicode property is > not exposed to elisp: > https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Properties.html > (Anyway maintaining explicit list of scripts is not a straightforward > approach.)
There are a few ways to approach this: (aref char-script-table ?中) -> 'han (string-match-p "\\cc" "中") -> 0 (aref (char-category-set ?中) ?|) -> t