Markus Jelsma created NUTCH-2912:
------------------------------------

             Summary: CrawlDatumProcessor to calculate crawl completeness
                 Key: NUTCH-2912
                 URL: https://issues.apache.org/jira/browse/NUTCH-2912
             Project: Nutch
          Issue Type: Improvement
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma


Simple processor that calculates the completeness of the crawl per host.

 

This does not account for known unknowns, e.g. unfetched URLs that haven't been 
found yet. Therefore, the calculated percentage can be highly volatile for 
beginning crawls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to