[ 
https://issues.apache.org/jira/browse/TIKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948955#comment-14948955
 ] 

Odilo Oehmichen commented on TIKA-1764:
---------------------------------------

Thanks for your response.

We are using Tika in combination with Solr Cell; here the class 
{{org.apache.tika.parser.pkg.PackageParser}} calls the 
{{ParsingEmbeddedDocumentExtractor}}. So if we don't want to patch the Solr 
Cell sourcecode the given options aren't a solution for us.

To provide some context in the exception-log, why not use all the data from the 
metadata object (by calling the {{toString()}}-method)? - In my eyes that's 
even better than not having any clue that parsing failed for some documents.


> Provide information on failed document parsing in 
> ParsingEmbeddedDocumentExtractor
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-1764
>                 URL: https://issues.apache.org/jira/browse/TIKA-1764
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 1.5, 1.10
>            Reporter: Odilo Oehmichen
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The {{ParsingEmbeddedDocumentExtractor}} delegates the parsing of documents 
> to a {{Parser}}-instance.  
> If this parser fails with a {{TikaException}} the extractor class returns 
> silenty:
> {code}
>  catch (TikaException e) {
>             // TODO: can we log a warning somehow?
>             // Could not parse the entry, just skip the content
>         }
> {code}
> This behaviour makes it very hard to detect problems concerning parsing.
> As the {{TODO}} in the source already states, please a some logging of the 
> exception here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to