Partial/Incomplete text extraction for certain Powerpoint files
---------------------------------------------------------------

                 Key: TIKA-684
                 URL: https://issues.apache.org/jira/browse/TIKA-684
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.9
            Reporter: Jonathan LI


Example file with issue attached.

Tika throws exception during text extraction of certain powerpoints.  In this 
example file, the extracted text only goes up to slide 37.  Text from slides 
38-40 are missing.

Tested via both tika library and tika GUI. Apache POI (3.8 beta 3 & 3.7) 
doesn't have any issues with text extraction of this file. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to