[
https://issues.apache.org/jira/browse/TIKA-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866268#comment-15866268
]
Tim Allison commented on TIKA-2265:
-----------------------------------
{noformat}
<w:footnote w:type="separator" w:id="0"><w:p w:rsidR="00D2605B"
w:rsidRDefault="00D2605B"><w:r><w:separator/></w:r></w:p></w:footnote>
<w:footnote w:type="continuationSeparator" w:id="1"><w:p w:rsidR="00D2605B"
w:rsidRDefault="00D2605B"><w:r><w:continuationSeparator/></w:r></w:p></w:footnote>
<w:footnote w:id="2">...actual footnote
{noformat}
Yep, that's a problem we should fix. We can't rely on the "id" being equal to
the footnote number. Looks like we have to calculate it dynamically by
skipping separators(?)...
Thank you for submitting an example document.
> Problem with footnotes/endnotes in Tika.parseToString with MS Word (.docx)
> files
> --------------------------------------------------------------------------------
>
> Key: TIKA-2265
> URL: https://issues.apache.org/jira/browse/TIKA-2265
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.14
> Environment: N/A
> Reporter: Mike Rodent
> Assignee: Tim Allison
> Priority: Minor
> Labels: newbie
> Attachments: test.docx, test shorter.docx
>
>
> It seems to be the case that a footnote numbered "1" in the real document
> will be outputted by Tika.parseToString() as "2" in the footnote reference,
> and "2" in the corresponding footnote body text.... real footnote "2" becomes
> "3", "3" becomes "4", etc. Have not yet looked at source code ... I can't
> imagine it would be difficult to correct this.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)