[
https://issues.apache.org/jira/browse/NUTCH-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315631#comment-17315631
]
ASF GitHub Bot commented on NUTCH-2858:
---------------------------------------
sebastian-nagel merged pull request #575:
URL: https://github.com/apache/nutch/pull/575
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> urlnormalizer-protocol: URL port is lost during normalization
> -------------------------------------------------------------
>
> Key: NUTCH-2858
> URL: https://issues.apache.org/jira/browse/NUTCH-2858
> Project: Nutch
> Issue Type: Bug
> Components: plugin, urlnormalizer
> Affects Versions: 1.18
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.19
>
>
> If a URL includes a port, e.g. {{http://example.com:8080/}} or
> {{https://example.com:8443/}}, the port is removed when normalizing using the
> protocol-urlnormalizer.
> Instead, if the port is set,
> - the port should be kept as is and
> - the protocol should be unchanged
> -* keeping the port and changing the protocol might result in a connection
> failure
> -* unlike the default port mappings (80 (http) <> 443 (https)),
> non-default port mappings (8080 <> 8443) are risky and unlikely to work on
> every server setup.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)