[ 
https://issues.apache.org/jira/browse/TIKA-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Palsulich closed TIKA-669.
--------------------------------
    Resolution: Duplicate

> Backup plan for parsing
> -----------------------
>
>                 Key: TIKA-669
>                 URL: https://issues.apache.org/jira/browse/TIKA-669
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>
> Currently once a document type has been detected we direct the document to 
> the one parser that best matches the detected type. In practice there are 
> cases where that parser finds that it in fact cannot parse this document, for 
> example when something that looked like XML turns out to have syntax errors. 
> For such cases it would be nice if the CompositeParser could then retry 
> parsing the document with a more generic backup parser, like the plain text 
> parser for malformed XML.
> Implementing this would require some level of buffering and redirection of 
> both parser input and output. Input buffering is easy, but for output 
> buffering we'd probably need to implement new ContentHandler and Metadata 
> layers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to