[
https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748104#comment-15748104
]
Nick Burch commented on TIKA-2208:
----------------------------------
Rather than doing it in code, what happens if you specify a Tika Config XML
file with just those wanted parsers in, and an ignore error handler? See
http://tika.apache.org/1.14/configuring.html#Configuring_Parsers and
http://tika.apache.org/1.14/configuring.html#Load_Error_Handling
> Catch missing libraires
> -----------------------
>
> Key: TIKA-2208
> URL: https://issues.apache.org/jira/browse/TIKA-2208
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: David Pilato
>
> Hi there
> We have decided to remove support for some formats when using Tika to extract
> text and metadata.
> We defined our list of Parsers:
> {code:java}
> private static final Parser PARSERS[] = new Parser[] {
> // documents
> new org.apache.tika.parser.html.HtmlParser(),
> new org.apache.tika.parser.rtf.RTFParser(),
> new org.apache.tika.parser.pdf.PDFParser(),
> new org.apache.tika.parser.txt.TXTParser(),
> new org.apache.tika.parser.microsoft.OfficeParser(),
> new org.apache.tika.parser.microsoft.OldExcelParser(),
> new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(),
> new org.apache.tika.parser.odf.OpenDocumentParser(),
> new org.apache.tika.parser.iwork.IWorkPackageParser(),
> new org.apache.tika.parser.xml.DcXMLParser(),
> new org.apache.tika.parser.epub.EpubParser(),
> };
> private static final AutoDetectParser PARSER_INSTANCE = new
> AutoDetectParser(PARSERS);
> private static final Tika TIKA_INSTANCE = new
> Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE);
> {code}
> But when a MS Office Word document embeds another non supported document
> (Like a Visio Schema) an {{NoClassDefFoundError}} is raised.
> Would it be possible to catch such a case and throw in that case a
> {{TikaException}} so it behaves as an Exception and not as a Throwable?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)