https://bugs.documentfoundation.org/show_bug.cgi?id=146429
Bug ID: 146429
Summary: Fallback to other character encodings detected by ICU
above a certain confidence threshold
Product: LibreOffice
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: enhancement
Priority: medium
Component: LibreOffice
Assignee: [email protected]
Reporter: [email protected]
Description:
As discussed in Bug 92161, we should modify SwIoSystem::IsDetectableText so
that if none of the encodings we explicitly check for match, we can consider
falling back to whatever ucsdet_getName (from icu library) returns (provided LO
supports it). We can use ucsdet_getConfidence to filter out anything below a
certain confidence threshold.
Steps to Reproduce:
1. Save a text file as one of the encodings we don't detect, e.g. EUC-KR with
some Korean text pasted in
2. open the file in LO Writer.
Actual Results:
It should display incorrectly as it is assumed to be Unicode or some other
encoding
Expected Results:
the filetype is correctly deterined and it displays happily
Reproducible: Always
User Profile Reset: No
Additional Info:
see desc
--
You are receiving this mail because:
You are the assignee for the bug.