DomainStats should process numeric CrawlDB metadata
---------------------------------------------------
Key: NUTCH-1149
URL: https://issues.apache.org/jira/browse/NUTCH-1149
Project: Nutch
Issue Type: Improvement
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Trivial
Fix For: 1.5
Right now the DomainStats program only outputs the sum of fetched records per
domain or host. It should also be able to output processed numerics of meta
data in order to get the average size (content length) for a given domain or
host. This is also useful for generating a metric for adult material (by domain
or host) when using a plugin that stores a propability factor of adult material
per URL in the Crawl DB.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira