Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by RobPettengill:
http://wiki.apache.org/nutch/bin/nutch_dedup

New page:
dedup is an alias for net.nutch.indexer.!DeleteDuplicates

Deletes duplicate documents in a set of Lucene indexes. Duplicates have either 
the same contents (via MD5 hash) or the same URL.

Usage: bin/nutch net.nutch.indexer.!DeleteDuplicates (-local | -ndfs 
<namenode:port>) [-workingdir <workingdir>] <segmentsDir>

[CommandLineOptions]

Reply via email to