[
https://issues.apache.org/jira/browse/TIKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948955#comment-14948955
]
Odilo Oehmichen commented on TIKA-1764:
---------------------------------------
Thanks for your response.
We are using Tika in combination with Solr Cell; here the class
{{org.apache.tika.parser.pkg.PackageParser}} calls the
{{ParsingEmbeddedDocumentExtractor}}. So if we don't want to patch the Solr
Cell sourcecode the given options aren't a solution for us.
To provide some context in the exception-log, why not use all the data from the
metadata object (by calling the {{toString()}}-method)? - In my eyes that's
even better than not having any clue that parsing failed for some documents.
> Provide information on failed document parsing in
> ParsingEmbeddedDocumentExtractor
> ----------------------------------------------------------------------------------
>
> Key: TIKA-1764
> URL: https://issues.apache.org/jira/browse/TIKA-1764
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 1.5, 1.10
> Reporter: Odilo Oehmichen
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The {{ParsingEmbeddedDocumentExtractor}} delegates the parsing of documents
> to a {{Parser}}-instance.
> If this parser fails with a {{TikaException}} the extractor class returns
> silenty:
> {code}
> catch (TikaException e) {
> // TODO: can we log a warning somehow?
> // Could not parse the entry, just skip the content
> }
> {code}
> This behaviour makes it very hard to detect problems concerning parsing.
> As the {{TODO}} in the source already states, please a some logging of the
> exception here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)