[PR] NUTCH-3113 Group commands in bin/nutch command-line help thematically [nutch]

via GitHub Wed, 09 Jul 2025 07:20:22 -0700


sebastian-nagel opened a new pull request, #852:
URL: https://github.com/apache/nutch/pull/852


   New output:
   ```
   $> $NUTCH_HOME/bin/nutch 
   nutch 1.21-SNAPSHOT
   Usage: nutch COMMAND [-Dproperty=value]... [command-specific args]...
   where COMMAND is one of:
    (Crawl commands)
     inject            inject new urls into the database
     generate          generate new segments to fetch from crawl db
     fetch             fetch a segment's pages
     parse             parse a segment's pages
     updatedb          update crawl db from segments after fetching
   
    (CrawlDb commands)
     readdb            read / dump crawl db
     mergedb           merge crawldb-s, with optional filtering
     dedup             deduplicate entries in the crawldb and assign them a 
special status
     domainstats       calculate domain statistics from crawldb
     protocolstats     calculate protocol status code stats from crawldb
     crawlcomplete     calculate crawl completion stats from crawldb
   
    (Segment tools)
     freegen           generate a new segment to fetch from a URL text file
     readseg           read / dump segment data
     mergesegs         merge several segments, with optional filtering and 
slicing
   
    (HostDb commands)
     updatehostdb      update the host db with records from the crawl db
     readhostdb        read / dump host db
     sitemap           perform Sitemap processing
   
    (LinkDb commands)
     readlinkdb        read / dump link db
     invertlinks       create a linkdb from parsed segments
     mergelinkdb       merge linkdb-s, with optional filtering
   
    (Index commands)
     index             run the plugin-based indexer on parsed segments and 
linkdb
     clean             remove HTTP 301 and 404 documents and duplicates from 
indexing backends
   
    (Webgraph commands)
     webgraph          generate a web graph from existing segments
     linkrank          run a link analysis program on the generated web graph
     scoreupdater      updates the crawldb with linkrank scores
     nodedumper        dumps the web graph's node scores
   
    (Debugging and validation tools)
     parsechecker      check the parser for a given url
     indexchecker      check the indexing filters for a given url
     filterchecker     check url filters for a given url
     normalizerchecker check url normalizers for a given url
     robotsparser      parse a robots.txt file and check whether urls are 
allowed or not
     plugin            load a plugin and run one of its classes main()
     junit             runs the given JUnit test
     showproperties    print Nutch/Hadoop configuration properties to stdout
   
    (Data export)
     dump              exports crawled data from segments into files
     commoncrawldump   exports crawled data from segments into common crawl 
data format encoded as CBOR
     warc              exports crawled data from segments at the WARC format
   
    (Nutch Server)
     startserver       runs the Nutch Server on localhost:8081
   
    (or)
     CLASSNAME         run the main of the class named CLASSNAME
   
   Most commands print help when invoked w/o parameters.
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] NUTCH-3113 Group commands in bin/nutch command-line help thematically [nutch]

Reply via email to