Remove deprecated parse plugins
-------------------------------
Key: NUTCH-836
URL: https://issues.apache.org/jira/browse/NUTCH-836
Project: Nutch
Issue Type: Task
Components: parser
Affects Versions: 1.1
Reporter: Julien Nioche
Assignee: Julien Nioche
Fix For: 2.0
Attachments: NUTCH-836.patch
Some of the parser plugins in 1.1 are covered by the parse-tika plugin. These
plugins have been kept in 1.1 but should be removed from 2.0 where we'll rely
on parse-tika almost exclusively. Some existing plugins might be kept when
there is no equivalent in Tika (to be discussed). The following plugins are
removed :
* parse-html
* parse-msexcel
* parse-mspowerpoint
* parse-msword
* parse-pdf
* parse-oo
* parse-text
* lib-jakarta-poi
* lib-parsems
The patch does not (yet) remove :
* parse-js
* parse-rss
* parse-swf
* parse-zip
* feed
Please review the patch and vote for its inclusion in the trunk.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.