[ 
https://issues.apache.org/jira/browse/NUTCH-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arkadi Kosmynin updated NUTCH-2603:
-----------------------------------
    Attachment: public_docs.txt

> Bring back legacy pre-Tika parsers and use them as back up parsers
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2603
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2603
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Arkadi Kosmynin
>            Priority: Major
>         Attachments: public_docs.txt
>
>
> There are cases when legacy parsers successfully parse documents on which 
> Tika fails. I am attaching a list of examples of such documents. Nutch allows 
> use of more than one parser on a document, in a sequence, until the document 
> has been parsed successfully. Thus, old parsers can be combined with Tika to 
> achieve better parsing success rate, at least until Tika is perfect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to