[
https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112270#comment-15112270
]
ASF GitHub Bot commented on TIKA-1840:
--------------------------------------
GitHub user zetisam opened a pull request:
https://github.com/apache/tika/pull/72
fix for TIKA-1840 contributed by zetisam
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zetisam/tika TIKA-1840
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tika/pull/72.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #72
----
commit 52b82bddef7c7ae8a430c9871594295e71882055
Author: Sam Heijens <[email protected]>
Date: 2016-01-22T10:09:48Z
fix for TIKA-1840 contributed by zetisam
----
> No way to link slide notes to slide in PPT output.
> --------------------------------------------------
>
> Key: TIKA-1840
> URL: https://issues.apache.org/jira/browse/TIKA-1840
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.11
> Reporter: Sam H
>
> I'm integrating Apache Tika into my project, and I want to extract (text)
> information from Powerpoint slides. Both PPT and PPTX
> I've noticed when using PPT format, the slide notes are all aggregated at the
> end of the XML output, and there is no way to identify which note belongs to
> which slide.
> I began looking at the code and found the following:
> {code}
> // TODO Find the Notes for this slide and extract inline
> {code}
> in
> [HSLFExtractor.java|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java]
> on line 140
> I would like to implement this part and contribute
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)