Julien Nioche created NUTCH-1403:
------------------------------------
Summary: Add default ScoringFilter for manipulating metadata
Key: NUTCH-1403
URL: https://issues.apache.org/jira/browse/NUTCH-1403
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
This is currently done by the urlmeta plugin, which has too vague a name and a
redundant indexing filter now that we have the index-metadata plugin. This
scoring filter would help defining which metadata to pass from :
- the crawl metadata to the content metadata
- the content metadata to the parse metadata
- the parse metadata to the crawldatum for the outlinks
I'd make this scoring filter available by default i.e. not in a separate plugin
as its functionalities are commonly used.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira