[jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help

ASF GitHub Bot (JIRA) Thu, 16 Apr 2015 12:47:42 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498578#comment-14498578
 ]


ASF GitHub Bot commented on NUTCH-1906:
---------------------------------------

GitHub user MJJoyce opened a pull request:

    https://github.com/apache/nutch/pull/20

    NUTCH-1906 - Remove duplicate stats flag listing in readdb help

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MJJoyce/nutch NUTCH-1906

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/20.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20
    
----
commit f33dfb8df1362cfc69d26a813f5b85c9b7a75020
Author: Michael Joyce <mltjo...@gmail.com>
Date:   2015-04-16T19:45:52Z

    NUTCH-1906 - Remove duplicate stats flag listing in readdb help

----


> Typo in CrawlDbReader command line help
> ---------------------------------------
>
>                 Key: NUTCH-1906
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1906
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.9
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Trivial
>             Fix For: 1.11
>
>
> Currently the CrawlDbReader tool, when invoked without any command line 
> arguments helps us as follows
> {code}
> [mdeploy@crawl local]$ ./bin/nutch readdb
> Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> 
> <out_dir> [<min>] | -url <url>)
>       <crawldb>       directory name where crawldb is located
>       -stats [-sort]  print overall statistics to System.out
>               [-sort] list status sorted by host
>       -dump <out_dir> [-format normal|csv|crawldb]    dump the whole db to a 
> text file in <out_dir>
>               [-format csv]   dump in Csv format
>               [-format normal]        dump in standard format (default option)
>               [-format crawldb]       dump as CrawlDB
>               [-regex <expr>] filter records with expression
>               [-retry <num>]  minimum retry count
>               [-status <status>]      filter records by CrawlDatum status
>       -url <url>      print information on <url> to System.out
>       -topN <nnnn> <out_dir> [<min>]  dump top <nnnn> urls sorted by score to 
> <out_dir>
>               [<min>] skip records with scores below this value.
>                       This can significantly improve performance.
> {code}
> The code that bothers me is
> {code}
>       -stats [-sort]  print overall statistics to System.out
>               [-sort] list status sorted by host
> {code}
> The inclusion of the double -sort is not necessary or required.
> Having looked through the code there is no other optional flag which we can 
> substitute for the second one (which I thought may lead to this being a 
> placeholder for something else) therefore we can just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-1906) Typo in CrawlDbReader command line help

Reply via email to