michaelh added a comment.

  In D11552#230870 <https://phabricator.kde.org/D11552#230870>, @alexeymin 
wrote:
  
  > Regarding this - `I don't know if it is really chinese look foreign enough 
to me anyway.`
  >  Some lines of text in your test script surely look like Japanese Hiragana 
to me, especially this one (and tests related to this)
  >
  >   echo "otto东到宛平路anna"> "終末なにしてますか?忙しいですか?救ってもらっていいですか? EP01 太阳の倾いたこの世界で 
-broken chronograph-.txt"
  >
  
  
  That's the only thing I was sure of (It was in fact an mkv I just watched). 
At this stage the actual language does not really matter.
  
  > But do your ranges include that characters? This answer on stackoverflow 
<https://stackoverflow.com/a/30200250/2323699> says that there are also other 
ranges for Hiragana, Katakana, etc... as @cfeck already said.
  
  My rationale was not to throw in every range mentioned on that wikipedia 
page, but just enough to make this work and illustrate the general approach.
  
  > Does it pass the test for you?
  
  All except the last two that is '*ですか? EP01' (<mixture of Latin/Hiragana) and 
'ですか' (<pure Hiragana). I could lie now and say I left out Hiragana character 
on purpose. I didn't, but for Hiragana the `one grapheme = one search term` 
does not apply. So those tests in fact should fail.
  
  @cfeck
  
  > if Baloo doesn't handle CJK, it maybe also doesn't handle other non-Latin 
scripts, so I suggest to use QChar::category()
  
  I wasn't aware of `QChar::category() `. Thank you.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11552

To: michaelh, #baloo, #frameworks, lbeltrame, bruns
Cc: alexeymin, cfeck, ashaposhnikov, michaelh, astippich, spoorun, 
nicolasfella, ngraham

Reply via email to