On Sun, 09 Nov 2025 23:03:42 +0800,
Ihor Radchenko wrote:
> 
> Huang Jing <[email protected]> writes:
> 
> > When writing Org documents that contain CJ (Chinese and Japanese)
> > text, inline markup such as |*bold*|, |/italic/| etc often fails to be
> > recognized correctly, since CJ writing conventions do not divide words
> > using spaces, while Org's parser relies on whitespace or punctuation
> > to detect markup boundaries.
> >
> >   测试*文本*           -> test *text* (Chinese)
> >   テスト*テキススト*    -> test *text* (Japanese)
> >
> > One workaround is using zero-width spaces, though this can be
> > problematic for TeX export backend since the CJK supporting macro-
> > packages (xeCJK, luatex-ja) might not be expecting this.
> 
> Thanks for reporting here!
> We have discussed this issue several times, but it was without native
> speakers participating in the discussion.
> 
> There are two aspects of the problem with markup in CJK languages:
> 
> 1. From Org perspective, 测试*文本* looks like an attempt to use
>    intraword markup - something that was not considered as a common use
>    case when designing Org markup. As a result, our official
>    recommendation with using zero-width space
>    (https://orgmode.org/manual/Escape-Character.html) becomes rather awkward.

Indeed. This is quite common among native speakers of languages that
separate words with spaces, that they don't often realize some
languages don't rely on spaces at all. (I'm not a linguist, but that's
the general idea.)

Using zero-width spaces (ZWS) can be an acceptable workaround it
there's proper tooling support. For instance, when LANGUAGE is set to
Chinese or Japanese (it should be CJK [Chinese, Japanese, Korean],
however I don't speak Korean [but I've heard that they do use spaces
to divide words in some occasions]), org-emphasize could automatically
insert ZWS around markers.

> 2. Even if zero-width space is used, it is not treated differently
>    during export - Org mode leaves zero-width spaces surrounding the
>    markup intact, which is problematic with LaTeX export, where
>    zero-width spaces are rendered as full-width spaces.
> 
> We have previously discussed both aspects (multiple times).
> 
> For the export, I once suggested a patch that would remove *all* zero-width
> spaces during export:
> https://list.orgmode.org/orgmode/87v8rkav2x.fsf@localhost/
> However, Max pointed that zero-width spaces might actually be added
> intentionally for the purposes of creating non-breakable line.

I'm afraid it's the opposite. I believe ZWS is actually intended to
indicate a potential line break, by marking the word boundary, rather
than prevent one. For the purposes of creating non-breakable lines,
one should use non-breaking space (NBSP, U+00A0) or word joiner (WJ,
U+2060) instead.

> Another suggestion is removing spaces only around emphasis markup
> https://list.orgmode.org/[email protected]/
> (then, we would also need to adjust ox-org to not lose the markup in
> such scenario). But then the question about intentionally used zero-width
> spaces (that happens to be around the markup) remains.

Personally, I've never seen anyone deliberately inserting ZWS in their
documents. Ans since it's purpose is simply for marking word
boundaries, I think it's totally fine remove them during export, given
that the export target has proper Chinese and Japanese word boundary
detection thus proper line break support (i.e., HTML do).

> And we can also do something just for LaTeX export. But then what about
> other export backends? Huang Jing, what are the conventions about
> zero-width spaces in Chinese HTML pages, OpenOffice documents, and in
> plain text?

As for different export backends, I don't think we should preserve ZWS
in HTML or OpenOffice exports, since they don't need ZWS for Chinese
and Japanese word boundary detection. For plain text, it's debatable:

*keeping ZWS* since some plain text viewers might rely on spaces for
 line breaks, and if they do correctly categorize ZWS as space (which
 is breakable) but does not support CJK line break, we shouldn't
 remove intentional ZWS.

*remove ZWS* since it's very unlikely that a text viewer supports ZWS
 but not Chinese and Japanese line breaking algorithms, since Unicode
 reference manual specifies how to break texts in languages Unicode
 supports into lines.

> For the Org markup itself, we have considered alternative escape
> mechanisms. For example, I once suggested \--{} (new entity) to serve as
> a replacement for zero-width spaces.
> https://list.orgmode.org/orgmode/87mtct9y1f.fsf@localhost/
> It still feels awkward though.
> 
> Another idea is mimicking markdown with its doubled emphasis markers
> text**bold**text.
> 
> Or we can make use of the discussed future inline special block markup:
> text@*{bold}text or text@bold{bold}text.
> https://list.orgmode.org/orgmode/875xwqj4tl.fsf@localhost/

Maybe I will post a question in the Emacs China forum to see what
other users think about this?

> The question is whether any of the above ideas will be practical for
> people who will need to use them often.
> 
> -- 
> Ihor Radchenko // yantar92,
> Org mode maintainer,
> Learn more about Org mode at <https://orgmode.org/>.
> Support Org development at <https://liberapay.com/org-mode>,
> or support my work at <https://liberapay.com/yantar92>

Reply via email to