https://bugs.documentfoundation.org/show_bug.cgi?id=151830
Bug ID: 151830
Summary: Save as text with coding utf-8 destroys all non-ascii
characters, replacing with question mark
Product: LibreOffice
Version: 7.3.5.2 release
Hardware: All
OS: Windows (All)
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: Writer
Assignee: [email protected]
Reporter: [email protected]
My version:
Version: 7.3.5.2 (x64) / LibreOffice Community
Build ID: 184fe81b8c8c30d8b5082578aee2fed2ea847c01
CPU threads: 8; OS: Windows 10.0 Build 22621; UI render: Skia/Raster; VCL: win
Locale: nb-NO (nb_NO); UI: nb-NO
Calc: threaded
I save a file using File->Save as or File->Save a copy, set the File type to
"Text - choose a coding", in the filter selection dialog, I choose encoding
"Unicode (UTF-8)" and line ending "LF".
Then I inspect the resulting file using od (octal dump) with options to show
byte values as ascii and hex code, (od -c -t x1).
The file begins with these two lines:
Ferden til boplassen
Endelig stod jeg der. Langs den lille kanalen foran meg lå tre små sjøfly.
Notice the three non-ascii characters in the last four words.
Inspecting the outcome, I find as follows:
$ od -c -t x1 Ren-tekst-versjon.txt | head -30
0000000 1 . F e r d e n
20 20 20 20 20 20 20 31 2e 46 65 72 64 65 6e 20
0000020 t i l b o p l a s s e n \n \n E
74 69 6c 20 62 6f 70 6c 61 73 73 65 6e 0a 0a 45
0000040 n d e l i g s t o d j e g
6e 64 65 6c 69 67 20 73 74 6f 64 20 6a 65 67 20
0000060 d e r . L a n g s d e n l
64 65 72 2e 20 4c 61 6e 67 73 20 64 65 6e 20 6c
0000100 i l l e k a n a l e n f o r
69 6c 6c 65 20 6b 61 6e 61 6c 65 6e 20 66 6f 72
0000120 a n m e g l ? t r e s m
61 6e 20 6d 65 67 20 6c 3f 20 74 72 65 20 73 6d
0000140 ? s j ? f l y . T e r m i n
3f 20 73 6a 3f 66 6c 79 2e 20 54 65 72 6d 69 6e
(The first line is a heading, here indented by seven spaces, which I did not
expect. In the original, it is not indented. The second line is part of a
longer paragraph and is saved as a single long line - this is expected and OK.)
The issue in this report is that the characters å and ø are replaced with
question marks. It seems like the file has not been converted to utf-8, but
rather to ascii.
--
You are receiving this mail because:
You are the assignee for the bug.