Crawldb update to total counts per status
-----------------------------------------
Key: NUTCH-1071
URL: https://issues.apache.org/jira/browse/NUTCH-1071
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Julien Nioche
Assignee: Julien Nioche
Priority: Trivial
Fix For: 1.4
The reduce phase of the crawldb update outputs all the entries that will be
found in the updated crawldb. We can use the counters to summarise the number
of URLs per status, which is a bit like the readdb -stats functionality except
that it does not require an additional step.
This is a useful way of monitoring the progress of a crawl using the Hadoop
JobTracker UI.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira