[ 
https://issues.apache.org/jira/browse/TIKA-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658485#comment-13658485
 ] 

Lee Graber commented on TIKA-1119:
----------------------------------

The file is in the old 2003 format. If I do a simple SaveAs and leave it in the 
2003 format, it still fails. If I do SaveAs and save in the new format (pptx), 
then it succeeds. The file size also drops massively (almost 50% ... 13.6MB -> 
7.9MB). 

Let me know if you need something else. I can't currently debug down into 
pic.getData as I don't have sources for that on my machine. I only know that it 
is the 55th pic out of 70something and that it does have some data.
                
> HSLFExtractor throws if PictureData is not readable
> ---------------------------------------------------
>
>                 Key: TIKA-1119
>                 URL: https://issues.apache.org/jira/browse/TIKA-1119
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.3
>         Environment: MAC and Ubuntu server tested
>            Reporter: Lee Graber
>
> Unfortunately the repro file contains customer sensitive information and 
> modifying it has eliminated the repro.
> In handleSlideEmbeddedPictures, the pic.getData() call can throw (in my case 
> I got "javax.imageio.IIOException: Error reading PNG image data"). Ideally 
> the parser would not be causing this but should this cause the whole parsing 
> stage to fail? The file itself opens fine in Office.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to