[ 
https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651535#comment-15651535
 ] 

Nick Burch commented on TIKA-2159:
----------------------------------

Given that we don't control all the parsers, I'm worried things my break oddly 
and unexpectedly for some users if we go for #2. That said, if we through a 
form of IOException with the details the moment the parser tried to do anything 
to the input stream, it might not cause too many issues

{{ParsingEmbeddedDocumentExtractor}} already has some non-ideal error handling 
bits, so writing some special keys onto the container might allow us to tidy 
some bits of that up too if we do #1

> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2159
>                 URL: https://issues.apache.org/jira/browse/TIKA-2159
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> When an embedded document is parsed and causes an exception, we're currently 
> catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the 
> default) or reporting it in the RecursiveParserWrapper by storing the 
> stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or 
> on getting the stream _before_ the stream hits the parser, we aren't handling 
> that uniformly or robustly across parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to