Kim Whitehall created NUTCH-2125:
------------------------------------
Summary: Metrics
Key: NUTCH-2125
URL: https://issues.apache.org/jira/browse/NUTCH-2125
Project: Nutch
Issue Type: Improvement
Components: tool
Affects Versions: 1.10
Reporter: Kim Whitehall
Purpose: a metric for determining if the “relevancy” of a crawl after each
round and the “relevancy” of a page. NB: this is not a scoring plugin. By
default, the first 25 terms will be stored.
- Return the topN terms per a page
- Return the topN terms per a segment based on td-idf
- Leverage Apache Lucene libs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)