https://bugs.documentfoundation.org/show_bug.cgi?id=171983

            Bug ID: 171983
           Summary: FILESAVE: Non-breaking hyphens (U+2011) turned into
                    junk characters on conversion from ODT -> DOCX
           Product: LibreOffice
           Version: 26.2.1.2 release
          Hardware: x86-64 (AMD64)
                OS: Windows (All)
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Writer
          Assignee: [email protected]
          Reporter: [email protected]

Description:
When saving an ODT document as DOCX, non-breaking hyphens (U+2011) are not
handled correctly. In Word they appear correct, and retain their non-breaking
behavior, but they are in fact "junk" characters with no unicode value. They
will thus be silently dropped or replaced with spaces if the text is copied and
pasted outside of Word. Copying one such dash into e..g the unicodeplus.com
search field it will appear as a blank space and the search result will be "no
character found".

See the attached ODT and DOCX files for examples.

This is a behavior that can have serious consequences. E.g. when writing a text
in LibreOffice that needs to be converted to DOCX before being sent off to a
printer (who requires submissions in DOCX format). When they move the text into
e.g. Adobe InDesign, the hyphens will be dropped.

Interestingly, if the text form the converted DOCX is copied and pasted back
into a blank LibreOffice document, the hyphens will "convert back" to U+2011.
This can be observed by copying the text from the DOCX to a blank ODT file and
then copying a hyphen from that ODT file into e.g. https://unicodeplus.com/

Steps to Reproduce:
1) Create and save an ODT document containing some non-breaking hyphens
(U+2011), entered by e.g. by pressing Ctrl+Shift+-
2) Open the ODT document, and "Save As" DOCX.
3) Open the DOCX in Word, and observe the hyphens acting as expected.
4) Copy the hyphens elsewhere, e.g. to https://unicodeplus.com/. Observe that
they now appear as blank spaces.

Actual Results:
Non-breaking spaces preserved as U+2011 after conversion to DOCX.

Expected Results:
Non-breaking spaces appear as "junk" characters with no Unicode value.


Reproducible: Always


User Profile Reset: No

Additional Info:
PS: I have not verified whether the bug is present in even earlier versions of
LibreOffice. I suspect it is not a new-ish bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to