[
https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann updated TIKA-1840:
------------------------------------
Fix Version/s: (was: 1.15)
1.16
> No way to link slide notes to slide in PPT output.
> --------------------------------------------------
>
> Key: TIKA-1840
> URL: https://issues.apache.org/jira/browse/TIKA-1840
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.11
> Reporter: Sam H
> Assignee: Chris A. Mattmann
> Fix For: 1.16
>
>
> I'm integrating Apache Tika into my project, and I want to extract (text)
> information from Powerpoint slides. Both PPT and PPTX
> I've noticed when using PPT format, the slide notes are all aggregated at the
> end of the XML output, and there is no way to identify which note belongs to
> which slide.
> I began looking at the code and found the following:
> {code}
> // TODO Find the Notes for this slide and extract inline
> {code}
> in
> [HSLFExtractor.java|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java]
> on line 140
> I would like to implement this part and contribute
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)