Sebastian Nagel created NUTCH-2858:
--------------------------------------
Summary: urlnormalizer-protocol: URL port is lost during
normalization
Key: NUTCH-2858
URL: https://issues.apache.org/jira/browse/NUTCH-2858
Project: Nutch
Issue Type: Bug
Components: plugin, urlnormalizer
Affects Versions: 1.18
Reporter: Sebastian Nagel
Assignee: Sebastian Nagel
Fix For: 1.19
If a URL includes a port, e.g. {{http://example.com:8080/}} or
{{https://example.com:8443/}}, the port is removed when normalizing using the
protocol-urlnormalizer.
Instead, if the port is set,
- the port should be kept as is and
- the protocol should be unchanged
-* keeping the port and changing the protocol might result in a connection
failure
-* unlike the default port mappings (80 (http) <> 443 (https)), non-default
port mappings (8080 <> 8443) are risky and unlikely to work on every server
setup.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)