Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch_readdb" page has been changed by MarkusJelsma:
http://wiki.apache.org/nutch/bin/nutch_readdb?action=diff&rev1=14&rev2=15

  '''-stats''': This prints the overall statistics to System.out.
  
  '''-dump <out_dir>''': Enables us to dump the whole crawldb to a text file in 
any <out_dir> we wish to specify.
+ 
  '''[-regex <expr>]: filter records with a regular expression
+ 
  '''[-status <status>]: filter records by CrawlDatum status
  
  '''-topN <nnnn> <out_dir> [<min>]''': This dumps the top <nnnn> urls sorted 
by score relevance to any <out_dir> we wish to specify. If the [<min>] 
parameter is passed in the command the reader will skip records with scores 
below this particluar value. This can significantly improve retrieval 
performance of statistics or crawldb dump results.

Reply via email to