[jira] [Commented] (NUTCH-3113) Group commands in bin/nutch command-line help

ASF GitHub Bot (Jira) Wed, 09 Jul 2025 07:23:08 -0700


    [ 
https://issues.apache.org/jira/browse/NUTCH-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004178#comment-18004178
 ]


ASF GitHub Bot commented on NUTCH-3113:
---------------------------------------

sebastian-nagel opened a new pull request, #852:
URL: https://github.com/apache/nutch/pull/852

   New output:
   ```
   $> $NUTCH_HOME/bin/nutch 
   nutch 1.21-SNAPSHOT
   Usage: nutch COMMAND [-Dproperty=value]... [command-specific args]...
   where COMMAND is one of:
    (Crawl commands)
     inject            inject new urls into the database
     generate          generate new segments to fetch from crawl db
     fetch             fetch a segment's pages
     parse             parse a segment's pages
     updatedb          update crawl db from segments after fetching
   
    (CrawlDb commands)
     readdb            read / dump crawl db
     mergedb           merge crawldb-s, with optional filtering
     dedup             deduplicate entries in the crawldb and assign them a 
special status
     domainstats       calculate domain statistics from crawldb
     protocolstats     calculate protocol status code stats from crawldb
     crawlcomplete     calculate crawl completion stats from crawldb
   
    (Segment tools)
     freegen           generate a new segment to fetch from a URL text file
     readseg           read / dump segment data
     mergesegs         merge several segments, with optional filtering and 
slicing
   
    (HostDb commands)
     updatehostdb      update the host db with records from the crawl db
     readhostdb        read / dump host db
     sitemap           perform Sitemap processing
   
    (LinkDb commands)
     readlinkdb        read / dump link db
     invertlinks       create a linkdb from parsed segments
     mergelinkdb       merge linkdb-s, with optional filtering
   
    (Index commands)
     index             run the plugin-based indexer on parsed segments and 
linkdb
     clean             remove HTTP 301 and 404 documents and duplicates from 
indexing backends
   
    (Webgraph commands)
     webgraph          generate a web graph from existing segments
     linkrank          run a link analysis program on the generated web graph
     scoreupdater      updates the crawldb with linkrank scores
     nodedumper        dumps the web graph's node scores
   
    (Debugging and validation tools)
     parsechecker      check the parser for a given url
     indexchecker      check the indexing filters for a given url
     filterchecker     check url filters for a given url
     normalizerchecker check url normalizers for a given url
     robotsparser      parse a robots.txt file and check whether urls are 
allowed or not
     plugin            load a plugin and run one of its classes main()
     junit             runs the given JUnit test
     showproperties    print Nutch/Hadoop configuration properties to stdout
   
    (Data export)
     dump              exports crawled data from segments into files
     commoncrawldump   exports crawled data from segments into common crawl 
data format encoded as CBOR
     warc              exports crawled data from segments at the WARC format
   
    (Nutch Server)
     startserver       runs the Nutch Server on localhost:8081
   
    (or)
     CLASSNAME         run the main of the class named CLASSNAME
   
   Most commands print help when invoked w/o parameters.
   
   ```




> Group commands in bin/nutch command-line help
> ---------------------------------------------
>
>                 Key: NUTCH-3113
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3113
>             Project: Nutch
>          Issue Type: Improvement
>          Components: CLI
>    Affects Versions: 1.20
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.21
>
>
> The 38 commands in the command-line help of bin/nutch appear in a long, 
> unstructured list. Grouping the commands thematically may help to find the 
> appropriate command. A PR is under way, the output will look like:
> {noformat}
>  (Crawl commands)
>   inject            inject new urls into the database
>   generate          generate new segments to fetch from crawl db
>   fetch             fetch a segment's pages
>   parse             parse a segment's pages
>   updatedb          update crawl db from segments after fetching
>  (CrawlDb commands)
>   readdb            read / dump crawl db
>   mergedb           merge crawldb-s, with optional filtering
>   dedup             deduplicate entries in the crawldb and assign them a 
> special status
>   domainstats       calculate domain statistics from crawldb
>   protocolstats     calculate protocol status code stats from crawldb
>   crawlcomplete     calculate crawl completion stats from crawldb
>  (Segment tools)
>    ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NUTCH-3113) Group commands in bin/nutch command-line help

Reply via email to