[
https://issues.apache.org/jira/browse/TIKA-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tyler Palsulich closed TIKA-669.
--------------------------------
Resolution: Duplicate
> Backup plan for parsing
> -----------------------
>
> Key: TIKA-669
> URL: https://issues.apache.org/jira/browse/TIKA-669
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
>
> Currently once a document type has been detected we direct the document to
> the one parser that best matches the detected type. In practice there are
> cases where that parser finds that it in fact cannot parse this document, for
> example when something that looked like XML turns out to have syntax errors.
> For such cases it would be nice if the CompositeParser could then retry
> parsing the document with a more generic backup parser, like the plain text
> parser for malformed XML.
> Implementing this would require some level of buffering and redirection of
> both parser input and output. Input buffering is easy, but for output
> buffering we'd probably need to implement new ContentHandler and Metadata
> layers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)