Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch_readdb" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch_readdb?action=diff&rev1=8&rev2=9

Comment:
Update to reflect Nutch 1.3 API

  The CrawlDbReader implements all the read-only parts of accessing our web 
database. It provides us with a read utility for the CrawlDB.
  
  Usage: 
+ 
  {{{
- bin/nutch org.apache.nutch.crawl.CrawlDbReader (-local | -ndfs 
<namenode:port>) <db> [-pageurl url] | [-pagemd5 md5] | [-dumppageurl] | 
[-dumppagemd5] | [-toppages <k>] | [-linkurl url] | [-linkmd5 md5] | 
[-dumplinks] | [-stats]
+ bin/nutch org.apache.nutch.crawl.CrawlDbReader <crawldb> (-stats | -dump 
<out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>)
- }}}
+ }}} 
  
- '''(-local | -ndfs <namenode:port>)''':
+ '''<crawldb>''': The location of the crawldb directory we wish to read and 
obtain information from.
  
- '''<db>''':
+ '''-stats''': This prints the overall statistics to System.out.
  
- '''[-pageurl url]''':
+ '''-dump <out_dir>''': Enables us to dump the whole crawldb to a text file in 
any <out_dir> we wish to specify.
  
- '''[-pagemd5 md5]''':
+ '''-topN <nnnn> <out_dir> [<min>]''': This dumps the top <nnnn> urls sorted 
by score relevance to any <out_dir> we wish to specify. If the [<min>] 
parameter is passed in the command the reader will skip records with scores 
below this particluar value. This can significantly improve retrieval 
performance of statistics or crawldb dump results.
  
- '''[-dumppageurl]''':
+ '''-url <url>''': This simply prints information of any particular <url> to 
System.out.
  
- '''[-dumppagemd5]''':
  
- '''[-toppages <k>]''':
- 
- '''[-linkurl url]''':
- 
- '''[-linkmd5 md5]''':
- 
- '''[-dumplinks]''':
- 
- '''[-stats]''':
  
  CommandLineOptions
  

Reply via email to