See the website seriously Sent from my iPhone
On Jun 15, 2015, at 6:02 PM, Owen Lin <[email protected]<mailto:[email protected]>> wrote: Please unsubscribe!!! On Jun 15, 2015 4:14 PM, "Sujen Shah (JIRA)" <[email protected]<mailto:[email protected]>> wrote: [ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587113#comment-14587113 ] Sujen Shah commented on NUTCH-2039: ----------------------------------- Done, updated the PR. > Relevance based scoring filter > ------------------------------ > > Key: NUTCH-2039 > URL: https://issues.apache.org/jira/browse/NUTCH-2039 > Project: Nutch > Issue Type: New Feature > Reporter: Sujen Shah > Labels: memex, nutch > Fix For: 1.11 > > > A ScoringFilter plugin that uses a similarity measure to calculate the > similarity between a given page(gold standard) and the currently parsed page. > The score obtained from this similarity is then distributed to its outlinks. > This filter aims to focus the crawler to crawl/explore relevant pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)

