Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch_readdb" page has been changed by MarkusJelsma: http://wiki.apache.org/nutch/bin/nutch_readdb?action=diff&rev1=15&rev2=16 '''-dump <out_dir>''': Enables us to dump the whole crawldb to a text file in any <out_dir> we wish to specify. - '''[-regex <expr>]: filter records with a regular expression + '''[-regex <expr>]''': filter records with a regular expression - '''[-status <status>]: filter records by CrawlDatum status + '''[-status <status>]''': filter records by CrawlDatum status '''-topN <nnnn> <out_dir> [<min>]''': This dumps the top <nnnn> urls sorted by score relevance to any <out_dir> we wish to specify. If the [<min>] parameter is passed in the command the reader will skip records with scores below this particluar value. This can significantly improve retrieval performance of statistics or crawldb dump results.

