https://bugs.kde.org/show_bug.cgi?id=362647
--- Comment #8 from GuHua <renyune...@gmail.com> --- Stefan, thanks for the suggestion of writting test cases. Yes, searching in character level (grapheme, if I understand this word correctly) is far better than nothing. To what I know (as a native Chinese speaker), many (or even most) Chinese people are happy enough if the software can deal with things in character level. Actually, using a dictionary is still not enough for Chinese -- it often happens that three (or more) character can be split in two different ways and they both make sense without context. A simple example of this scenario could be "化學生": both "化學" (chemistry) and "學生" (student) make sense (moreover, sometimes "化學生" also makes sense, meaning "a student whose major is in chemistry"), so context is the only way we can tell how to correctly split them (e.g. "教化學生" will most likely be split into "教化" [enlighten/teach] and "學生" [student]). Correctly handling of Chinese words requires more sophisticated Natural Language Processing techniques (e.g. using machine learning), and I think that would be far beyond today's baloo (or maybe even any search / index engines). (I have studied machine learning and natural language processing during my masters, so it should be safe for me to say that today's NLP technique [for Chinese word-splitting] is not yet good enough to be used in production [compared with character level and judged in a user's sense, i.e. false positive is better than false negative].) Classical Chinese (this is a style of composing sentences and ways of understanding characters / words, not like the different between "Traditional Chinese" and "Simplified Chinese") makes the situation more difficult. Almost all historical texts (e.g. history recordings / books / poems) (there are quite a LOT) are written in Classical Chinese, and nowadays Chinese people still study Classical Chinese and read those things (though we usually don't write in Classical Chinese). Even humans may still need some effort to read a piece of text written in Classical Chinese (but Classical Chinese is very very consise, that's one of the reasons it exists). However, in Classical Chinese, characters "are" words in many cases. Splitting by characters is a very good choice. -- You are receiving this mail because: You are watching all bug changes.