[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587113#comment-14587113 ]
Sujen Shah commented on NUTCH-2039: ----------------------------------- Done, updated the PR. > Relevance based scoring filter > ------------------------------ > > Key: NUTCH-2039 > URL: https://issues.apache.org/jira/browse/NUTCH-2039 > Project: Nutch > Issue Type: New Feature > Reporter: Sujen Shah > Labels: memex, nutch > Fix For: 1.11 > > > A ScoringFilter plugin that uses a similarity measure to calculate the > similarity between a given page(gold standard) and the currently parsed page. > The score obtained from this similarity is then distributed to its outlinks. > This filter aims to focus the crawler to crawl/explore relevant pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)