[Nutch Wiki] Update of "CommandLineOptions" by LewisJohnMcgibbney

Apache Wiki Tue, 14 Jun 2011 13:57:42 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "CommandLineOptions" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/CommandLineOptions?action=diff&rev1=12&rev2=13

- = Command Line Options of bin/nutch =
+ = Nutch 1.3 Command Line Options of bin/nutch =
  
  The script bin/nutch is a helper which picks different java classes to "run". 
  
  See each entry for details of the command arguments and options.
  
  ||'''command'''||'''function'''||
- ||[[bin/nutch_admin]]||{OLD/Removed} Web page and link database 
administration, including creation||
- ||[[bin/nutch_analyze]]||{OLD/Removed} Adjust database link-analysis scoring||
- ||[[bin/nutch_crawl]]||Perform complete crawling and indexing of a set of 
root urls||
- ||[[bin/nutch_datanode]]||{OLD/Removed} NDFS data node||
- ||[[bin/nutch_dedup]]||Deletes duplicate documents in a set of segment 
indexes||
+ ||[[bin/nutch_crawl]]||One-step crawler for intranets||
+ ||[[bin/nutch_convdb]]||Convert crawl db from pre-0.9 format||
+ ||[[bin/nutch mergedb]]||Merge crawldb-s, with optional filtering||
+ ||[[bin/nutch readlinkdb]]||Read / dump link db||
+ ||[[bin/nutch_inject]]||Inject new urls into the database||
+ ||[[bin/nutch_generate]]||Generate new segments to fetch from crawl db||
+ ||[[bin/nutch_freegen]]||Generate new segments to fetch from text files||
  ||[[bin/nutch_fetch]]||Fetch a segment's pages||
- ||[[bin/nutch_fetchlist]]||{OLD/Removed}Print the fetchlist of a segment||
- ||[[bin/nutch_generate]]||Generate new segments to fetch||
- ||[[bin/nutch_index]]||Run the indexer on a segment's fetcher output||
- ||[[bin/nutch_inject]]||Inject new urls into the web page and link database||
- ||[[bin/nutch_merge]]||Merge several segment indexes||
+ ||[[bin/nutch_parse]]||Parse a segment's pages||
+ ||[[bin/nutch_readseg]]||Read / dump segment data||
+ ||[[bin/nutch_mergesegs]]||Merges multiple segments, with optional filtering 
and slicing||
+ ||[[bin/nutch_updatedb]]||Update crawl db from segments after fetching||
+ ||[[bin/nutch_invertlinks]]||Create a linkdb from parsed segments||
- ||[[bin/nutch mergedbs]]||merge crawldb-s, with optional filtering||
+ ||[[bin/nutch_mergelinkdb]]||Merge's linkdb-s, with optional filtering||
+ ||[[bin/nutch solrindex]]||Run the solr indexer on parsed segments and 
linkdb||
- ||[[bin/nutch_mergesegs]]||Merges multiple segments & removes duplicates||
- ||[[bin/nutch_namenode]]||{OLD/Removed} NDFS name node||
- ||[[bin/nutch_ndfs]]||{OLD/Removed} NDFS administrative access||
- ||[[bin/nutch_parse]]||Parse contents in one segment||
- ||[[bin/nutch_prune]]||Prunes existing Nutch indexes of unwanted content||
- ||[[bin/nutch_readdb]]||Read data from the web page and link db||
- ||[[bin/nutch_segread]]||Read data in an existing segment||
- ||[[bin/nutch_segslice]]||Divide data from one segement into several 
segments||
- ||[[bin/nutch_server]]||Run a search server of IPC connections||
- ||[[bin/nutch solrdedup]]||Deletes duplicate documents from solr||
+ ||[[bin/nutch solrdedup]]||Removes duplicate documents from solr||
- ||[[bin/nutch solrclean]]||Deletes 404 documents from solr||
+ ||[[bin/nutch solrclean]]||Removes HTTP 301 and 404 documents from solr||
- ||[[bin/nutch_updatedb]]||Updates the web page and link db from the segment 
fetcher output||
+ ||[[bin/nutch plugin]]||Load a plugin and run one of its classes main()||
+ or
+ ||[[bin/nutch CLASSNAME]]||run the class named CLASSNAME||
  ||                          ||                                               
||

[Nutch Wiki] Update of "CommandLineOptions" by LewisJohnMcgibbney

Reply via email to