> How to avoid duplicate content? You can use the org.apache.nutch.crawl.TextProfileSignature implementation instead of the default MD5Signature or provide your own Signature implementation.
Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
> How to avoid duplicate content? You can use the org.apache.nutch.crawl.TextProfileSignature implementation instead of the default MD5Signature or provide your own Signature implementation.
Jérôme -- http://motrech.free.fr/ http://www.frutch.org/