[
https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Theodor Sjöstedt updated TIKA-1428:
-----------------------------------
Attachment: TIKA-doc-footnotes-issue.png
Original document to the left.
TIKA 1.4 in Center
TIKA 1.6 to the right
> Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement
> Character
> -------------------------------------------------------------------------------------
>
> Key: TIKA-1428
> URL: https://issues.apache.org/jira/browse/TIKA-1428
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.4, 1.6
> Reporter: Theodor Sjöstedt
> Priority: Minor
> Attachments: TIKA-doc-footnotes-issue.png
>
>
> Footnotes from {{.doc}} documents are extracted, but the references to the
> footnotes are replaced by the Unicode Replacement Character (�).
> I have tried this in 1.4 and 1.6.
> In 1.4, both reference in text and reference at footnote have been replaced.
> In 1.6, reference in text has disappeared completely.
> See attached image for original document, 1.4 Formatted text, and 1.6
> Formatted text.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)