https://bugs.documentfoundation.org/show_bug.cgi?id=151830

            Bug ID: 151830
           Summary: Save as text with coding utf-8 destroys all non-ascii
                    characters, replacing with question mark
           Product: LibreOffice
           Version: 7.3.5.2 release
          Hardware: All
                OS: Windows (All)
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Writer
          Assignee: [email protected]
          Reporter: [email protected]

My version: 
Version: 7.3.5.2 (x64) / LibreOffice Community
Build ID: 184fe81b8c8c30d8b5082578aee2fed2ea847c01
CPU threads: 8; OS: Windows 10.0 Build 22621; UI render: Skia/Raster; VCL: win
Locale: nb-NO (nb_NO); UI: nb-NO
Calc: threaded

I save a file using File->Save as or File->Save a copy, set the File type to
"Text - choose a coding", in the filter selection dialog, I choose encoding
"Unicode (UTF-8)" and line ending "LF". 

Then I inspect the resulting file using od (octal dump) with options to show
byte values as ascii and hex code, (od -c -t x1).

The file begins with these two lines:

Ferden til boplassen
Endelig stod jeg der. Langs den lille kanalen foran meg lå tre små sjøfly.

Notice the three non-ascii characters in the last four words. 

Inspecting the outcome, I find as follows:

$ od -c -t x1 Ren-tekst-versjon.txt | head -30
0000000                               1   .   F   e   r   d   e   n
         20  20  20  20  20  20  20  31  2e  46  65  72  64  65  6e  20
0000020   t   i   l       b   o   p   l   a   s   s   e   n  \n  \n   E
         74  69  6c  20  62  6f  70  6c  61  73  73  65  6e  0a  0a  45
0000040   n   d   e   l   i   g       s   t   o   d       j   e   g
         6e  64  65  6c  69  67  20  73  74  6f  64  20  6a  65  67  20
0000060   d   e   r   .       L   a   n   g   s       d   e   n       l
         64  65  72  2e  20  4c  61  6e  67  73  20  64  65  6e  20  6c
0000100   i   l   l   e       k   a   n   a   l   e   n       f   o   r
         69  6c  6c  65  20  6b  61  6e  61  6c  65  6e  20  66  6f  72
0000120   a   n       m   e   g       l   ?       t   r   e       s   m
         61  6e  20  6d  65  67  20  6c  3f  20  74  72  65  20  73  6d
0000140   ?       s   j   ?   f   l   y   .       T   e   r   m   i   n
         3f  20  73  6a  3f  66  6c  79  2e  20  54  65  72  6d  69  6e

(The first line is a heading, here indented by seven spaces, which I did not
expect. In the original, it is not indented. The second line is part of a
longer paragraph and is saved as a single long line - this is expected and OK.)

The issue in this report is that the characters å and ø are replaced with
question marks. It seems like the file has not been converted to utf-8, but
rather to ascii.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to