https://bugs.kde.org/show_bug.cgi?id=517639

            Bug ID: 517639
           Summary: OCR for Chinese text inserts unnecessary spaces
                    between characters
    Classification: Applications
           Product: Spectacle
      Version First unspecified
       Reported In:
          Platform: Arch Linux
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: General
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
  Target Milestone: ---

When using Spectacle's OCR function to recognize Simplified Chinese text, it
automatically adds unnecessary spaces between each Chinese character.

Example:
Expected: 这是中文测试
Actual: 这 是 中 文 测 试

This is because Spectacle uses Tesseract but does not perform post-processing
to remove character spaces for CJK languages. The OCR result is unreadable for
Chinese users.

Steps to reproduce:
1. Take a screenshot containing Chinese text
2. Use Extract Text (OCR) in Spectacle
3. Paste the result, Chinese characters have spaces between them

OBSERVED RESULT
这 是 中 文 测 试

EXPECTED RESULT
这是中文测试

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 6.6.2
KDE Plasma Version: 6.6.2
KDE Frameworks Version: 6.23.0
Qt Version: 6.10.2
Spectacle: 6.6.2
Plasma: 6.6.2
Tesseract: 5.5.2
Distribution: Arch Linux

Suggestion:
Add post-processing to automatically remove spaces between CJK characters after
OCR.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to