Sebastian Nagel created NUTCH-2859:
--------------------------------------

             Summary: urlnormalizer-protocol: allow to normalize domains
                 Key: NUTCH-2859
                 URL: https://issues.apache.org/jira/browse/NUTCH-2859
             Project: Nutch
          Issue Type: Improvement
          Components: plugin, urlnormalizer
    Affects Versions: 1.18
            Reporter: Sebastian Nagel
             Fix For: 1.19


The plugin urlnormalizer-protocol normalizes the URL protocol/scheme for a 
given list of hosts to the desired "normal" protocol (usually one of http or 
https). It would be handy to allow to specify domain names as well, so that all 
hosts/subdomains in a given domain are normalized.

In order to stay backward-compatible this could be done by matching 
{{*.example.org}} as a pattern for all hosts or subdomains of the domain 
{{example.org}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to