https://bugs.kde.org/show_bug.cgi?id=440246

            Bug ID: 440246
           Summary: Chinese manual search term segmentation issues
           Product: krita
           Version: nightly build (please specify the git hash!)
          Platform: Other
                OS: Other
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: Documentation
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
  Target Milestone: ---

(In reply to Tyson Tan from bug 419321)
> Actually, I don't think the current Search box is working properly for CJK
> languages yet. For example, it can seach "笔刷" and return some results, but
> searching "笔刷预设" returns nothing. There must be some issues with the word
> dividing logic.

- `jieba` split it into two separate terms "笔刷" and "预设" in the search index.
- The client side searching code can only split terms by whitespace, so "笔刷预设"
(or any continuous CJK chars for the matter) is considered one term.
- The searching code probably only finds exact matches.

(In case you are interested, you can check the generated index at [1] -- paste
its contents into a JS beautifier [2] and enable "Unescape printable chars
encoded as \xNN or \uNNNN".)

There are ways to provide `jieba` with custom dictionary terms and segmentation
rules (check its readme [3] for more info). I think we can initialize them from
`conf.py` if you would like to add some.

However, *if* we make "笔刷预设" a full term in the index, then it seems likely
that the search term "笔刷" will not be able to yield the results indexed with
the term "笔刷预设", which might be worse than it currently is.

We can probably see if there are any improvements in the upstream
`sphinx_rtd_theme` search code to be backported, but most likely we will have
to hack together something for matching search terms to get the behaviour we
want. 

[1]: https://docs.krita.org/zh_CN/searchindex.js
[2]: https://beautifier.io/
[3]: https://github.com/fxsjy/jieba

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to