Backup plan for parsing
-----------------------

                 Key: TIKA-669
                 URL: https://issues.apache.org/jira/browse/TIKA-669
             Project: Tika
          Issue Type: New Feature
          Components: parser
            Reporter: Jukka Zitting


Currently once a document type has been detected we direct the document to the 
one parser that best matches the detected type. In practice there are cases 
where that parser finds that it in fact cannot parse this document, for example 
when something that looked like XML turns out to have syntax errors. For such 
cases it would be nice if the CompositeParser could then retry parsing the 
document with a more generic backup parser, like the plain text parser for 
malformed XML.

Implementing this would require some level of buffering and redirection of both 
parser input and output. Input buffering is easy, but for output buffering we'd 
probably need to implement new ContentHandler and Metadata layers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to