[ https://issues.apache.org/jira/browse/NUTCH-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004178#comment-18004178 ]
ASF GitHub Bot commented on NUTCH-3113: --------------------------------------- sebastian-nagel opened a new pull request, #852: URL: https://github.com/apache/nutch/pull/852 New output: ``` $> $NUTCH_HOME/bin/nutch nutch 1.21-SNAPSHOT Usage: nutch COMMAND [-Dproperty=value]... [command-specific args]... where COMMAND is one of: (Crawl commands) inject inject new urls into the database generate generate new segments to fetch from crawl db fetch fetch a segment's pages parse parse a segment's pages updatedb update crawl db from segments after fetching (CrawlDb commands) readdb read / dump crawl db mergedb merge crawldb-s, with optional filtering dedup deduplicate entries in the crawldb and assign them a special status domainstats calculate domain statistics from crawldb protocolstats calculate protocol status code stats from crawldb crawlcomplete calculate crawl completion stats from crawldb (Segment tools) freegen generate a new segment to fetch from a URL text file readseg read / dump segment data mergesegs merge several segments, with optional filtering and slicing (HostDb commands) updatehostdb update the host db with records from the crawl db readhostdb read / dump host db sitemap perform Sitemap processing (LinkDb commands) readlinkdb read / dump link db invertlinks create a linkdb from parsed segments mergelinkdb merge linkdb-s, with optional filtering (Index commands) index run the plugin-based indexer on parsed segments and linkdb clean remove HTTP 301 and 404 documents and duplicates from indexing backends (Webgraph commands) webgraph generate a web graph from existing segments linkrank run a link analysis program on the generated web graph scoreupdater updates the crawldb with linkrank scores nodedumper dumps the web graph's node scores (Debugging and validation tools) parsechecker check the parser for a given url indexchecker check the indexing filters for a given url filterchecker check url filters for a given url normalizerchecker check url normalizers for a given url robotsparser parse a robots.txt file and check whether urls are allowed or not plugin load a plugin and run one of its classes main() junit runs the given JUnit test showproperties print Nutch/Hadoop configuration properties to stdout (Data export) dump exports crawled data from segments into files commoncrawldump exports crawled data from segments into common crawl data format encoded as CBOR warc exports crawled data from segments at the WARC format (Nutch Server) startserver runs the Nutch Server on localhost:8081 (or) CLASSNAME run the main of the class named CLASSNAME Most commands print help when invoked w/o parameters. ``` > Group commands in bin/nutch command-line help > --------------------------------------------- > > Key: NUTCH-3113 > URL: https://issues.apache.org/jira/browse/NUTCH-3113 > Project: Nutch > Issue Type: Improvement > Components: CLI > Affects Versions: 1.20 > Reporter: Sebastian Nagel > Priority: Major > Fix For: 1.21 > > > The 38 commands in the command-line help of bin/nutch appear in a long, > unstructured list. Grouping the commands thematically may help to find the > appropriate command. A PR is under way, the output will look like: > {noformat} > (Crawl commands) > inject inject new urls into the database > generate generate new segments to fetch from crawl db > fetch fetch a segment's pages > parse parse a segment's pages > updatedb update crawl db from segments after fetching > (CrawlDb commands) > readdb read / dump crawl db > mergedb merge crawldb-s, with optional filtering > dedup deduplicate entries in the crawldb and assign them a > special status > domainstats calculate domain statistics from crawldb > protocolstats calculate protocol status code stats from crawldb > crawlcomplete calculate crawl completion stats from crawldb > (Segment tools) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)