[
https://issues.apache.org/jira/browse/NUTCH-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530656
]
Enis Soztutar commented on NUTCH-558:
-------------------------------------
I wonder why you do not use URLUtils introduced in NUTCH-439. Also there is a
similar tool(not committed) in this patch which extracts url/domain/tld
statistics from the crawldb, but lacks filtering.
> Need tool to retrieve domain statistics
> ---------------------------------------
>
> Key: NUTCH-558
> URL: https://issues.apache.org/jira/browse/NUTCH-558
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 0.9.0
> Reporter: Chris Schneider
> Assignee: Chris Schneider
> Attachments: DomainStats.patch
>
>
> Several developers have expressed interest in a tool to retrieve statistics
> from a crawl on a domain basis (e.g., how many pages were successfully
> fetched from www.apache.org vs. apache.org, where the latter total would
> include the former).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.