Lewis John McGibbney created NUTCH-2147:
-------------------------------------------
Summary: LanguagePreferenceScoringFilter for Nutch
Key: NUTCH-2147
URL: https://issues.apache.org/jira/browse/NUTCH-2147
Project: Nutch
Issue Type: New Feature
Components: plugin, scoring
Affects Versions: 1.10
Reporter: Lewis John McGibbney
Fix For: 1.11
Based on the implementation of a LanguagePreferenceScoringFilter Nutch could
easily be made into a directed crawler based on crawl administrator ranking
preferences of languages we wish to crawl.
Right now this is not possible.
We already detect and index language within the language-identifier plugin as
well as within parse-tika irrc, however currently the presence of a language
does not effect scoring of pages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)