[Nutch Wiki] Trivial Update of "bin/nutch_readdb" by MarkusJelsma

Apache Wiki Mon, 09 Jan 2012 08:15:38 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "bin/nutch_readdb" page has been changed by MarkusJelsma:
http://wiki.apache.org/nutch/bin/nutch_readdb?action=diff&rev1=14&rev2=15

  '''-stats''': This prints the overall statistics to System.out.
  
  '''-dump <out_dir>''': Enables us to dump the whole crawldb to a text file in 
any <out_dir> we wish to specify.
+ 
  '''[-regex <expr>]: filter records with a regular expression
+ 
  '''[-status <status>]: filter records by CrawlDatum status
  
  '''-topN <nnnn> <out_dir> [<min>]''': This dumps the top <nnnn> urls sorted 
by score relevance to any <out_dir> we wish to specify. If the [<min>] 
parameter is passed in the command the reader will skip records with scores 
below this particluar value. This can significantly improve retrieval 
performance of statistics or crawldb dump results.

[Nutch Wiki] Trivial Update of "bin/nutch_readdb" by MarkusJelsma

Reply via email to