Arkadi Kosmynin created NUTCH-2603:
--------------------------------------

             Summary: Bring back legacy pre-Tika parsers and use them as back 
up parsers
                 Key: NUTCH-2603
                 URL: https://issues.apache.org/jira/browse/NUTCH-2603
             Project: Nutch
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.15
            Reporter: Arkadi Kosmynin


There are cases when legacy parsers successfully parse documents on which Tika 
fails. I am attaching a list of examples of such documents. Nutch allows use of 
more than one parser on a document, in a sequence, until the document has been 
parsed successfully. Thus, old parsers can be combined with Tika to achieve 
better parsing success rate, at least until Tika is perfect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to