Theodor Sjöstedt created TIKA-1428:
--------------------------------------
Summary: Microsoft Word 97 - 2003 (.doc) footnote references are
Unicode Replacement Character
Key: TIKA-1428
URL: https://issues.apache.org/jira/browse/TIKA-1428
Project: Tika
Issue Type: Bug
Affects Versions: 1.6, 1.4
Reporter: Theodor Sjöstedt
Priority: Minor
Footnotes from {{.doc}} documents are extracted, but the references to the
footnotes are replaced by the Unicode Replacement Character (�).
I have tried this in 1.4 and 1.6.
In 1.4, both reference in text and reference at footnote have been replaced.
In 1.6, reference in text has disappeared completely.
See attached image for original document, 1.4 Formatted text, and 1.6 Formatted
text.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)