Markus Jelsma created NUTCH-2912: ------------------------------------ Summary: CrawlDatumProcessor to calculate crawl completeness Key: NUTCH-2912 URL: https://issues.apache.org/jira/browse/NUTCH-2912 Project: Nutch Issue Type: Improvement Reporter: Markus Jelsma Assignee: Markus Jelsma
Simple processor that calculates the completeness of the crawl per host. This does not account for known unknowns, e.g. unfetched URLs that haven't been found yet. Therefore, the calculated percentage can be highly volatile for beginning crawls. -- This message was sent by Atlassian Jira (v8.20.1#820001)