Lewis John McGibbney created NUTCH-1906:
-------------------------------------------

             Summary: Typo in CrawlDbReader command line help
                 Key: NUTCH-1906
                 URL: https://issues.apache.org/jira/browse/NUTCH-1906
             Project: Nutch
          Issue Type: Bug
          Components: crawldb
    Affects Versions: 1.9
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
            Priority: Trivial
             Fix For: 1.10


Currently the CrawlDbReader tool, when invoked without any command line 
arguments helps us as follows
{code}
[mdeploy@crawl local]$ ./bin/nutch readdb
Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> 
<out_dir> [<min>] | -url <url>)
        <crawldb>       directory name where crawldb is located
        -stats [-sort]  print overall statistics to System.out
                [-sort] list status sorted by host
        -dump <out_dir> [-format normal|csv|crawldb]    dump the whole db to a 
text file in <out_dir>
                [-format csv]   dump in Csv format
                [-format normal]        dump in standard format (default option)
                [-format crawldb]       dump as CrawlDB
                [-regex <expr>] filter records with expression
                [-retry <num>]  minimum retry count
                [-status <status>]      filter records by CrawlDatum status
        -url <url>      print information on <url> to System.out
        -topN <nnnn> <out_dir> [<min>]  dump top <nnnn> urls sorted by score to 
<out_dir>
                [<min>] skip records with scores below this value.
                        This can significantly improve performance.
{code}
The code that bothers me is
{code}
        -stats [-sort]  print overall statistics to System.out
                [-sort] list status sorted by host
{code}
The inclusion of the double -sort is not necessary or required.
Having looked through the code there is no other optional flag which we can 
substitute for the second one (which I thought may lead to this being a 
placeholder for something else) therefore we can just remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to