[
https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498578#comment-14498578
]
ASF GitHub Bot commented on NUTCH-1906:
---------------------------------------
GitHub user MJJoyce opened a pull request:
https://github.com/apache/nutch/pull/20
NUTCH-1906 - Remove duplicate stats flag listing in readdb help
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MJJoyce/nutch NUTCH-1906
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/20.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20
----
commit f33dfb8df1362cfc69d26a813f5b85c9b7a75020
Author: Michael Joyce <[email protected]>
Date: 2015-04-16T19:45:52Z
NUTCH-1906 - Remove duplicate stats flag listing in readdb help
----
> Typo in CrawlDbReader command line help
> ---------------------------------------
>
> Key: NUTCH-1906
> URL: https://issues.apache.org/jira/browse/NUTCH-1906
> Project: Nutch
> Issue Type: Bug
> Components: crawldb
> Affects Versions: 1.9
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Trivial
> Fix For: 1.11
>
>
> Currently the CrawlDbReader tool, when invoked without any command line
> arguments helps us as follows
> {code}
> [mdeploy@crawl local]$ ./bin/nutch readdb
> Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn>
> <out_dir> [<min>] | -url <url>)
> <crawldb> directory name where crawldb is located
> -stats [-sort] print overall statistics to System.out
> [-sort] list status sorted by host
> -dump <out_dir> [-format normal|csv|crawldb] dump the whole db to a
> text file in <out_dir>
> [-format csv] dump in Csv format
> [-format normal] dump in standard format (default option)
> [-format crawldb] dump as CrawlDB
> [-regex <expr>] filter records with expression
> [-retry <num>] minimum retry count
> [-status <status>] filter records by CrawlDatum status
> -url <url> print information on <url> to System.out
> -topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to
> <out_dir>
> [<min>] skip records with scores below this value.
> This can significantly improve performance.
> {code}
> The code that bothers me is
> {code}
> -stats [-sort] print overall statistics to System.out
> [-sort] list status sorted by host
> {code}
> The inclusion of the double -sort is not necessary or required.
> Having looked through the code there is no other optional flag which we can
> substitute for the second one (which I thought may lead to this being a
> placeholder for something else) therefore we can just remove it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)