https://bugs.documentfoundation.org/show_bug.cgi?id=125995
--- Comment #2 from Stephan Bergmann <[email protected]> ---
(In reply to Stephan Bergmann from comment #1)
> Arguably, according to my above explanation, the gen file picker shows the
> right file name here. With LANG=C, osl_getThreadTextEconding() effectively
> is RTL_TEXTENCODING_ISO_8859_1 (though technically it is
> RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg".
(Above and below, Bugzilla apparently dropped the C1 control characters \U+0082
and \U+0085 from "ÅÄka.jpg", where they should appear after "Å" and after "Ä",
respectively.)
> The kde5 and gtk3 file pickers presumably use external library code that
> doesn't follow LO's convention of interpreting pathnames' byte sequences
> according to the system locale, but instead always interpret them as UTF-8.
> That would explain why the kde5 file picker dialog shows the file's name as
> "łąka.png" instead of "ÅÄka.jpg". But once the kde5 file picker has passed
> the <file:///.../%C5%82%C4%85ka.jpg> URL (which is the same URL as the gen
> file picker passes) to LO's internals, LO will again treat that as
> representing a pathname whose bytes are interpreted according to
> osl_getThreadTextEncoding().
Sorry, the above "which is the same URL as the gen file picker passes" is
wrong: With LANG=C, LO interprets that file name as written with the
characters
\U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
\U+0082 <control>
\U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
\U+0085 <control>
\U+006B LATIN SMALL LETTER K
...
and "LO internal file URLs" always have their "payload" encoded as UTF-8 (see
udkapi/com/sun/star/uri/XExternalUriReferenceTranslator.idl), so the LO
internal file URL that the gen file picker returns is
<file:///.../%C3%85%C2%82%C3%84%C2%85ka.png>. (And when LO wants to access the
actual file and converts that URL back to a pathname byte sequence under
LANG=C, it first converts from the URL syntax "%C3%85%C2%82%C3%84%C2%85ka.png"
to an OUString containing
\U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
\U+0082 <control>
\U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
\U+0085 <control>
\U+006B LATIN SMALL LETTER K
...
code units, and then, because of the osl_getThreadTextEncoding() mandated by
LANG=C, to the correct byte sequence "\xC5\x82\xC4\x85ka.png".)
--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs