DomainStats should process numeric CrawlDB metadata
---------------------------------------------------

                 Key: NUTCH-1149
                 URL: https://issues.apache.org/jira/browse/NUTCH-1149
             Project: Nutch
          Issue Type: Improvement
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
            Priority: Trivial
             Fix For: 1.5


Right now the DomainStats program only outputs the sum of fetched records per 
domain or host. It should also be able to output processed numerics of meta 
data in order to get the average size (content length) for a given domain or 
host. This is also useful for generating a metric for adult material (by domain 
or host) when using a plugin that stores a propability factor of adult material 
per URL in the Crawl DB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to