[ https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498578#comment-14498578 ]
ASF GitHub Bot commented on NUTCH-1906: --------------------------------------- GitHub user MJJoyce opened a pull request: https://github.com/apache/nutch/pull/20 NUTCH-1906 - Remove duplicate stats flag listing in readdb help You can merge this pull request into a Git repository by running: $ git pull https://github.com/MJJoyce/nutch NUTCH-1906 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/20.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20 ---- commit f33dfb8df1362cfc69d26a813f5b85c9b7a75020 Author: Michael Joyce <mltjo...@gmail.com> Date: 2015-04-16T19:45:52Z NUTCH-1906 - Remove duplicate stats flag listing in readdb help ---- > Typo in CrawlDbReader command line help > --------------------------------------- > > Key: NUTCH-1906 > URL: https://issues.apache.org/jira/browse/NUTCH-1906 > Project: Nutch > Issue Type: Bug > Components: crawldb > Affects Versions: 1.9 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Priority: Trivial > Fix For: 1.11 > > > Currently the CrawlDbReader tool, when invoked without any command line > arguments helps us as follows > {code} > [mdeploy@crawl local]$ ./bin/nutch readdb > Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> > <out_dir> [<min>] | -url <url>) > <crawldb> directory name where crawldb is located > -stats [-sort] print overall statistics to System.out > [-sort] list status sorted by host > -dump <out_dir> [-format normal|csv|crawldb] dump the whole db to a > text file in <out_dir> > [-format csv] dump in Csv format > [-format normal] dump in standard format (default option) > [-format crawldb] dump as CrawlDB > [-regex <expr>] filter records with expression > [-retry <num>] minimum retry count > [-status <status>] filter records by CrawlDatum status > -url <url> print information on <url> to System.out > -topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to > <out_dir> > [<min>] skip records with scores below this value. > This can significantly improve performance. > {code} > The code that bothers me is > {code} > -stats [-sort] print overall statistics to System.out > [-sort] list status sorted by host > {code} > The inclusion of the double -sort is not necessary or required. > Having looked through the code there is no other optional flag which we can > substitute for the second one (which I thought may lead to this being a > placeholder for something else) therefore we can just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)