https://bugs.documentfoundation.org/show_bug.cgi?id=136246

Dennis Roczek <dennisroc...@libreoffice.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |filter:rtf

--- Comment #12 from Dennis Roczek <dennisroc...@libreoffice.org> ---
Oooh, I just realize: the problem is not the content itself, it is tab
character!

If it is replaced by a whitespace using search and replace it is /mostly/
correctly displayed. 

So basically it is in the file itself: \u8198\'20 which reads in the latest RTF
spec 1.9.1 as following:

------------------------------------
\uN This keyword represents a single Unicode character that has no equivalent
ANSI representation
based on the current ANSI code page. N represents the Unicode character value
expressed as a
decimal number.
This keyword is followed immediately by equivalent character(s) in ANSI
representation. In this
way, old readers will ignore the \uN keyword and pick up the ANSI
representation properly.
When this keyword is encountered, the reader should ignore the next N'
characters, where N'
corresponds to the last \ucN' value encountered.
As with all RTF keywords, a keyword-terminating space may be present (before
the ANSI
characters) that is not counted in the characters to skip. While this is not
likely to occur (or
recommended), a \binN keyword, its argument, and the binary data that follows
are considered
one character for skipping purposes. If an RTF scope delimiter character (that
is, an opening or
closing brace) is encountered while scanning skippable data, the skippable data
is considered to
end before the delimiter. This makes it possible for a reader to perform some
rudimentary error
recovery. To include an RTF delimiter in skippable data, it must be represented
using the
appropriate control symbol (that is, escaped with a backslash,) as in plain
text. Any RTF control
word or symbol is considered a single character for the purposes of counting
skippable characters.

An RTF writer, when it encounters a Unicode character with no corresponding
ANSI character,
should output \uN followed by the best ANSI representation it can manage. Often
a question
mark is used if no reasonable ANSI character exists. In addition, if the
Unicode character
translates into an ANSI character stream with a count of bytes differing from
the current Unicode
Character Byte Count, it should emit the appropriate \ucN keyword prior to the
\uN keyword to
notify the reader of the change.
Most RTF control words accept signed 16-bit numbers as arguments. For these
control words,
Unicode values greater than 32767 are expressed as negative numbers. For
example, the
character code U+F020 is given by \u-4064. To get -4064, convert F02016 to
decimal (61472)
and subtract 65536.
Occasionally Word writes SYMBOL_CHARSET (nonUnicode) characters in the range
U+F020..U+F0FF instead of U+0020..U+00FF. Internally Word uses the values
U+F020..U+F0FF
for these characters so that plain-text searches don’t mistakenly match
SYMBOL_CHARSET
characters when searching for Unicode characters in the range U+0020..U+00FF.
To find out the
correct symbol font to use, e.g., Wingdings, Symbol, etc., find the last
SYMBOL_CHARSET font
control word \fN used, look up font N in the font table and find the face name.
The charset is
specified by the \fch

------------------------------------

So as LibreOffice /seems/ not to identify \u8198 it should only display a
whitespace.

I guess 8198 is that character https://www.codetable.net/decimal/8198
("Six-Per-Em Space", but isn't this \u2006?!?).

So why do we not recognize that character? *g*

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to