Partial/Incomplete text extraction for certain Powerpoint files
---------------------------------------------------------------
Key: TIKA-684
URL: https://issues.apache.org/jira/browse/TIKA-684
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.9
Reporter: Jonathan LI
Example file with issue attached.
Tika throws exception during text extraction of certain powerpoints. In this
example file, the extracted text only goes up to slide 37. Text from slides
38-40 are missing.
Tested via both tika library and tika GUI. Apache POI (3.8 beta 3 & 3.7)
doesn't have any issues with text extraction of this file.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira