https://bugs.kde.org/show_bug.cgi?id=517639
Noah Davis <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REPORTED |NEEDSINFO Resolution|--- |WAITINGFORINFO --- Comment #1 from Noah Davis <[email protected]> --- > This is because Spectacle uses Tesseract but does not perform post-processing > to remove character spaces for CJK languages. The OCR result is unreadable > for Chinese users. Is it common practice to process the CJK output of tesseract? I'm a bit wary of doing our own processing of tesseract output separately from tesseract's own options. I am not an expert on the various forms of Chinese, Japanese and Korean scripts. While I'm sure you have far more experience with Chinese scripts than I do, it would be nice to follow some kind of standard instead of just doing our own thing. One could also make the argument on a technical level that improving tesseract is the correct solution, but I don't know how difficult that would be. -- You are receiving this mail because: You are watching all bug changes.
