[Libreoffice-bugs] [Bug 92161] GBK encoded Chinese text not auto-detected

bugzilla-daemon Wed, 22 Dec 2021 22:23:38 -0800

https://bugs.documentfoundation.org/show_bug.cgi?id=92161


--- Comment #17 from Mike Kaganski <[email protected]> ---
(In reply to Daniel Thomas from comment #16)
> Have created https://gerrit.libreoffice.org/c/core/+/127347 to fix this.

Thanks - merged! :)

> Though now I'm wondering whether we could modify that code to support all of
> the encodings in LO?

It should be relatively easy. We already have
rtl_getTextEncodingFromMimeCharset, which seems to be what ucsdet_getName
returns. The only concern here would be false detections, and we could use
ucsdet_getConfidence [1] to filter out unreliable detections.

Feel free to submit a new enhancement, and then fix it - that would be a nice
hack!

[1]
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ucsdet_8h.html#a30dd8812653be28766f1ee1bbc412c18

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 92161] GBK encoded Chinese text not auto-detected

Reply via email to