Sujen Shah created NUTCH-2047:
---------------------------------
Summary: Improvements to the relevance scoring plugin
Key: NUTCH-2047
URL: https://issues.apache.org/jira/browse/NUTCH-2047
Project: Nutch
Issue Type: Improvement
Components: scoring
Reporter: Sujen Shah
Fix For: 1.11
To discuss the results and improvements on the scoring-similarity plugin using
the cosine similarity model.
Currently, the outlinks are distributed the same score as the parent URL. Which
means an irrelevant URL(with a relevant parent) would be fetched for one more
round before it gets a lower score and filtered. So we would require one
additional fetch/parse to score these irrelevant urls(from relevant parents)
lower.
Any suggestions on this are appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)