[
https://issues.apache.org/jira/browse/TIKA-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658485#comment-13658485
]
Lee Graber commented on TIKA-1119:
----------------------------------
The file is in the old 2003 format. If I do a simple SaveAs and leave it in the
2003 format, it still fails. If I do SaveAs and save in the new format (pptx),
then it succeeds. The file size also drops massively (almost 50% ... 13.6MB ->
7.9MB).
Let me know if you need something else. I can't currently debug down into
pic.getData as I don't have sources for that on my machine. I only know that it
is the 55th pic out of 70something and that it does have some data.
> HSLFExtractor throws if PictureData is not readable
> ---------------------------------------------------
>
> Key: TIKA-1119
> URL: https://issues.apache.org/jira/browse/TIKA-1119
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.3
> Environment: MAC and Ubuntu server tested
> Reporter: Lee Graber
>
> Unfortunately the repro file contains customer sensitive information and
> modifying it has eliminated the repro.
> In handleSlideEmbeddedPictures, the pic.getData() call can throw (in my case
> I got "javax.imageio.IIOException: Error reading PNG image data"). Ideally
> the parser would not be causing this but should this cause the whole parsing
> stage to fail? The file itself opens fine in Office.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira