michaelh added a comment.
In D11552#230870 <https://phabricator.kde.org/D11552#230870>, @alexeymin wrote: > Regarding this - `I don't know if it is really chinese look foreign enough to me anyway.` > Some lines of text in your test script surely look like Japanese Hiragana to me, especially this one (and tests related to this) > > echo "otto东到宛平路anna"> "終末なにしてますか?忙しいですか?救ってもらっていいですか? EP01 太阳の倾いたこの世界で -broken chronograph-.txt" > That's the only thing I was sure of (It was in fact an mkv I just watched). At this stage the actual language does not really matter. > But do your ranges include that characters? This answer on stackoverflow <https://stackoverflow.com/a/30200250/2323699> says that there are also other ranges for Hiragana, Katakana, etc... as @cfeck already said. My rationale was not to throw in every range mentioned on that wikipedia page, but just enough to make this work and illustrate the general approach. > Does it pass the test for you? All except the last two that is '*ですか? EP01' (<mixture of Latin/Hiragana) and 'ですか' (<pure Hiragana). I could lie now and say I left out Hiragana character on purpose. I didn't, but for Hiragana the `one grapheme = one search term` does not apply. So those tests in fact should fail. @cfeck > if Baloo doesn't handle CJK, it maybe also doesn't handle other non-Latin scripts, so I suggest to use QChar::category() I wasn't aware of `QChar::category() `. Thank you. REPOSITORY R293 Baloo REVISION DETAIL https://phabricator.kde.org/D11552 To: michaelh, #baloo, #frameworks, lbeltrame, bruns Cc: alexeymin, cfeck, ashaposhnikov, michaelh, astippich, spoorun, nicolasfella, ngraham