[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498689#comment-14498689
]
Michael Joyce commented on NUTCH-1911:
--------------------------------------
Hey folks,
Here's what the output from this looks like
{code}
Usage: DomainStatistics inputDirs outDir mode [numOfReducer]
inputDirs Comma separated list of crawldb input directories
E.g.: crawl/crawldb/current/
outDir Output directory where results should be dumped
mode Set statistics gathering mode
host Gather statistics by host
domain Gather statistics by domain
suffix Gather statistics by suffix
tld Gather statistics by top level directory
[numOfReducers] Optional number of reduce jobs to use. Defaults to 1.
{code}
> Imeprove DomainStatistics tool command line parsing
> ---------------------------------------------------
>
> Key: NUTCH-1911
> URL: https://issues.apache.org/jira/browse/NUTCH-1911
> Project: Nutch
> Issue Type: Bug
> Components: util
> Affects Versions: 1.9, 2.2.1
> Reporter: Lewis John McGibbney
> Priority: Trivial
> Fix For: 1.11
>
>
> The DomainStatistic's tool could be improved based on the comments addressed
> in [this mai
> thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html]
> For convenience, I've also pasted them below
> {quote}
> You cannot just tell it where the crawldb is, you need to tell it where the
> directory is, so specifying current is ok, but not part-*
> {quote}
> Patch should be trivial work
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)