[
https://issues.apache.org/jira/browse/NUTCH-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315630#comment-17315630
]
ASF GitHub Bot commented on NUTCH-2859:
---------------------------------------
sebastian-nagel merged pull request #576:
URL: https://github.com/apache/nutch/pull/576
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> urlnormalizer-protocol: allow to normalize domains
> --------------------------------------------------
>
> Key: NUTCH-2859
> URL: https://issues.apache.org/jira/browse/NUTCH-2859
> Project: Nutch
> Issue Type: Improvement
> Components: plugin, urlnormalizer
> Affects Versions: 1.18
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Major
> Fix For: 1.19
>
>
> The plugin urlnormalizer-protocol normalizes the URL protocol/scheme for a
> given list of hosts to the desired "normal" protocol (usually one of http or
> https). It would be handy to allow to specify domain names as well, so that
> all hosts/subdomains in a given domain are normalized.
> In order to stay backward-compatible this could be done by matching
> {{*.example.org}} as a pattern for all hosts or subdomains of the domain
> {{example.org}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)