Tim Allison created TIKA-1612:
---------------------------------
Summary: Exceptions getting image data in PPT files
Key: TIKA-1612
URL: https://issues.apache.org/jira/browse/TIKA-1612
Project: Tika
Issue Type: Bug
Reporter: Tim Allison
Priority: Minor
In numerous (~500) ppt files in govdocs1, we're getting zip exceptions (unknown
compression method, bad block, etc) when Tika's HSLFExtractor calls
{{getData()}} on an embedded image.
Under normal circumstances (I just learned today...), if an attachment causes a
RuntimeException, we are currently swallowing that in
{{ParsingEmbeddedDocumentExtractor}}.
However, because we're calling {{getData()}} before the embedded extractor
takes over, if there is an exception there, the parse of the entire file fails.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)