[ 
https://issues.apache.org/jira/browse/TIKA-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Rodent updated TIKA-2264:
------------------------------
    Attachment: ImprovedODFContentParser.java

Note that this is peppered with multiple comments by me.  It also contains 
various methods which I used in developing these changes.  It also uses a 
LOGGER to permit a means of logging any anomalies.  As a complete newb to this 
process of hopefully contributing to an open source project I invite everyone 
to mess around with it as they see fit.

> Better handling of footnotes/endnotes for ODF files
> ---------------------------------------------------
>
>                 Key: TIKA-2264
>                 URL: https://issues.apache.org/jira/browse/TIKA-2264
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.14
>         Environment: N/A
>            Reporter: Mike Rodent
>            Priority: Minor
>              Labels: newbie
>         Attachments: ImprovedODFContentParser.java
>
>
> Springs from my question here 
> (http://stackoverflow.com/questions/42031237/modify-apache-tika-parsing-of-old-1997-2003-ms-word-docs)
>  ... I have improve the class OpenDocumentContentParser so that it puts 
> footnotes/endnotes at the end of the line to which they belong and doesn't 
> break up the line in question.  As with .docx parsing the notes can be linked 
> to the reference easily.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to