Kim Whitehall created NUTCH-2125:
------------------------------------

             Summary: Metrics
                 Key: NUTCH-2125
                 URL: https://issues.apache.org/jira/browse/NUTCH-2125
             Project: Nutch
          Issue Type: Improvement
          Components: tool
    Affects Versions: 1.10
            Reporter: Kim Whitehall


Purpose: a metric for determining if the “relevancy” of a crawl after each 
round and the “relevancy” of a page. NB: this is not a scoring plugin. By 
default, the first 25 terms will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on td-idf

- Leverage Apache Lucene libs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to