Lewis John McGibbney created NUTCH-1906:
-------------------------------------------
Summary: Typo in CrawlDbReader command line help
Key: NUTCH-1906
URL: https://issues.apache.org/jira/browse/NUTCH-1906
Project: Nutch
Issue Type: Bug
Components: crawldb
Affects Versions: 1.9
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Priority: Trivial
Fix For: 1.10
Currently the CrawlDbReader tool, when invoked without any command line
arguments helps us as follows
{code}
[mdeploy@crawl local]$ ./bin/nutch readdb
Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn>
<out_dir> [<min>] | -url <url>)
<crawldb> directory name where crawldb is located
-stats [-sort] print overall statistics to System.out
[-sort] list status sorted by host
-dump <out_dir> [-format normal|csv|crawldb] dump the whole db to a
text file in <out_dir>
[-format csv] dump in Csv format
[-format normal] dump in standard format (default option)
[-format crawldb] dump as CrawlDB
[-regex <expr>] filter records with expression
[-retry <num>] minimum retry count
[-status <status>] filter records by CrawlDatum status
-url <url> print information on <url> to System.out
-topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to
<out_dir>
[<min>] skip records with scores below this value.
This can significantly improve performance.
{code}
The code that bothers me is
{code}
-stats [-sort] print overall statistics to System.out
[-sort] list status sorted by host
{code}
The inclusion of the double -sort is not necessary or required.
Having looked through the code there is no other optional flag which we can
substitute for the second one (which I thought may lead to this being a
placeholder for something else) therefore we can just remove it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)