Markus Jelsma created NUTCH-2912:
------------------------------------
Summary: CrawlDatumProcessor to calculate crawl completeness
Key: NUTCH-2912
URL: https://issues.apache.org/jira/browse/NUTCH-2912
Project: Nutch
Issue Type: Improvement
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Simple processor that calculates the completeness of the crawl per host.
This does not account for known unknowns, e.g. unfetched URLs that haven't been
found yet. Therefore, the calculated percentage can be highly volatile for
beginning crawls.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)