[ 
https://issues.apache.org/jira/browse/TIKA-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584176#comment-13584176
 ] 

Michael McCandless commented on TIKA-1074:
------------------------------------------

Thanks Jukka.

InterruptedException is never thrown in these places today, so I can't add the 
separate catch clause (compiler is angry).

So, the instanceof check for IE is in case in the future we do handle 
interrupts in these places ... we could just remove it and add it back in the 
future if we add IE (seems risky).

Or I can change that code to throw TikaException instead on interrupt (and 
restore the interrupt bit), except in the TikaCLI case, 
EmbeddedDocumentExtractor.parseEmbedded doesn't throw TikaException today (the 
other two places already do).  But it's a little weird throw TikaExc in 
response to an interrupt (ie, code above will be trying to catch an IE) ... I 
think it's cleaner to set the interrupt bit and let the next place that waits 
see the interrupt bit and throw IE?
                
> Extraction should continue if an exception is hit visiting an embedded 
> document
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-1074
>                 URL: https://issues.apache.org/jira/browse/TIKA-1074
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 1.4
>
>         Attachments: TIKA-1074.patch, TIKA-1074.patch
>
>
> Spinoff from TIKA-1072.
> In that issue, a problematic document (still not sure if document is corrupt, 
> or possible POI bug) caused an exception when visiting the embedded documents.
> If I change Tika to suppress that exception, the rest of the document 
> extracts fine.
> So somehow I think we should be more robust here, and maybe log the 
> exception, or save/record the exception(s) somewhere so after parsing the app 
> could decide what to do about them ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to