urlnormalizer-unalias plugin ---------------------------- Key: NUTCH-737 URL: https://issues.apache.org/jira/browse/NUTCH-737 Project: Nutch Issue Type: New Feature Affects Versions: 1.0.0 Reporter: Dmitry Lihachev
I tried to search any whole site duplication detection tools without success. This plugin allows to do domain name transformation (for example www.google.com -> google.com). It is very stupid, but can be useful when fighting with site aliases. For detect site aliases I use my own ugly class (based on SolrDeleteDuplicates). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.