[
https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651535#comment-15651535
]
Nick Burch commented on TIKA-2159:
----------------------------------
Given that we don't control all the parsers, I'm worried things my break oddly
and unexpectedly for some users if we go for #2. That said, if we through a
form of IOException with the details the moment the parser tried to do anything
to the input stream, it might not cause too many issues
{{ParsingEmbeddedDocumentExtractor}} already has some non-ideal error handling
bits, so writing some special keys onto the container might allow us to tidy
some bits of that up too if we do #1
> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
> Key: TIKA-2159
> URL: https://issues.apache.org/jira/browse/TIKA-2159
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Tim Allison
> Priority: Minor
>
> When an embedded document is parsed and causes an exception, we're currently
> catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the
> default) or reporting it in the RecursiveParserWrapper by storing the
> stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or
> on getting the stream _before_ the stream hits the parser, we aren't handling
> that uniformly or robustly across parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)