sebastian-nagel opened a new pull request, #852:
URL: https://github.com/apache/nutch/pull/852
New output:
```
$> $NUTCH_HOME/bin/nutch
nutch 1.21-SNAPSHOT
Usage: nutch COMMAND [-Dproperty=value]... [command-specific args]...
where COMMAND is one of:
(Crawl commands)
inject inject new urls into the database
generate generate new segments to fetch from crawl db
fetch fetch a segment's pages
parse parse a segment's pages
updatedb update crawl db from segments after fetching
(CrawlDb commands)
readdb read / dump crawl db
mergedb merge crawldb-s, with optional filtering
dedup deduplicate entries in the crawldb and assign them a
special status
domainstats calculate domain statistics from crawldb
protocolstats calculate protocol status code stats from crawldb
crawlcomplete calculate crawl completion stats from crawldb
(Segment tools)
freegen generate a new segment to fetch from a URL text file
readseg read / dump segment data
mergesegs merge several segments, with optional filtering and
slicing
(HostDb commands)
updatehostdb update the host db with records from the crawl db
readhostdb read / dump host db
sitemap perform Sitemap processing
(LinkDb commands)
readlinkdb read / dump link db
invertlinks create a linkdb from parsed segments
mergelinkdb merge linkdb-s, with optional filtering
(Index commands)
index run the plugin-based indexer on parsed segments and
linkdb
clean remove HTTP 301 and 404 documents and duplicates from
indexing backends
(Webgraph commands)
webgraph generate a web graph from existing segments
linkrank run a link analysis program on the generated web graph
scoreupdater updates the crawldb with linkrank scores
nodedumper dumps the web graph's node scores
(Debugging and validation tools)
parsechecker check the parser for a given url
indexchecker check the indexing filters for a given url
filterchecker check url filters for a given url
normalizerchecker check url normalizers for a given url
robotsparser parse a robots.txt file and check whether urls are
allowed or not
plugin load a plugin and run one of its classes main()
junit runs the given JUnit test
showproperties print Nutch/Hadoop configuration properties to stdout
(Data export)
dump exports crawled data from segments into files
commoncrawldump exports crawled data from segments into common crawl
data format encoded as CBOR
warc exports crawled data from segments at the WARC format
(Nutch Server)
startserver runs the Nutch Server on localhost:8081
(or)
CLASSNAME run the main of the class named CLASSNAME
Most commands print help when invoked w/o parameters.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]