Julien Nioche created NUTCH-1403:
------------------------------------

             Summary: Add default ScoringFilter for manipulating metadata 
                 Key: NUTCH-1403
                 URL: https://issues.apache.org/jira/browse/NUTCH-1403
             Project: Nutch
          Issue Type: Improvement
            Reporter: Julien Nioche


This is currently done by the urlmeta plugin, which has too vague a name and a 
redundant indexing filter now that we have the index-metadata plugin. This 
scoring filter would help defining which metadata to pass from : 
- the crawl metadata to the content metadata
- the content metadata to the parse metadata
- the parse metadata to the crawldatum for the outlinks
I'd make this scoring filter available by default i.e. not in a separate plugin 
as its functionalities are commonly used.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to