[ https://issues.apache.org/jira/browse/NUTCH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitry Lihachev updated NUTCH-737: ---------------------------------- Attachment: NUTCH-737_urlfilter_unalias.patch > urlnormalizer-unalias plugin > ---------------------------- > > Key: NUTCH-737 > URL: https://issues.apache.org/jira/browse/NUTCH-737 > Project: Nutch > Issue Type: New Feature > Affects Versions: 1.0.0 > Reporter: Dmitry Lihachev > Priority: Minor > Attachments: NUTCH-737_urlfilter_unalias.patch > > > I tried to search any whole site duplication detection tools without success. > This plugin allows to do domain name transformation (for example > www.google.com -> google.com). It is very stupid, but can be useful when > fighting with site aliases. For detect site aliases I use my own ugly class > (based on SolrDeleteDuplicates). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.