[ https://issues.apache.org/jira/browse/NUTCH-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2147: ----------------------------------- Fix Version/s: (was: 1.14) 1.15 > MetadataScoringFilter for Nutch > ------------------------------- > > Key: NUTCH-2147 > URL: https://issues.apache.org/jira/browse/NUTCH-2147 > Project: Nutch > Issue Type: New Feature > Components: plugin, scoring > Affects Versions: 1.10 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 1.15 > > > This issue originally started by envisioning an implementation of a > LanguagePreferenceScoringFilter so that Nutch could easily be made into a > directed crawler based on crawl administrator ranking preferences of > languages we wish to crawl. > Right now this is not possible. > We already detect and index language within the language-identifier plugin as > well as within parse-tika irrc, however currently the presence of a language > does not effect scoring of pages. > The scope of this issue has changed to make it more generally applicable for > a wider variety of use cases. This will therefore take advantage of > NUTCH-1980 by pulling (amongst other things) Language entries from the > CrawlDB Metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)