Sebastian Nagel created NUTCH-2859:
--------------------------------------
Summary: urlnormalizer-protocol: allow to normalize domains
Key: NUTCH-2859
URL: https://issues.apache.org/jira/browse/NUTCH-2859
Project: Nutch
Issue Type: Improvement
Components: plugin, urlnormalizer
Affects Versions: 1.18
Reporter: Sebastian Nagel
Fix For: 1.19
The plugin urlnormalizer-protocol normalizes the URL protocol/scheme for a
given list of hosts to the desired "normal" protocol (usually one of http or
https). It would be handy to allow to specify domain names as well, so that all
hosts/subdomains in a given domain are normalized.
In order to stay backward-compatible this could be done by matching
{{*.example.org}} as a pattern for all hosts or subdomains of the domain
{{example.org}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)