[ 
https://issues.apache.org/jira/browse/NUTCH-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315671#comment-17315671
 ] 

Hudson commented on NUTCH-2858:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #31 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/31/])
NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization 
(snagel: 
[https://github.com/apache/nutch/commit/c454a640f97efc9e16da1b4a910fab6d85103f95])
* (edit) 
src/plugin/urlnormalizer-protocol/src/test/org/apache/nutch/net/urlnormalizer/protocol/TestProtocolURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization 
(snagel: 
[https://github.com/apache/nutch/commit/d7499205a7ad5eb0d324af7e027b9fdd051d8d22])
* (edit) conf/protocols.txt.template


> urlnormalizer-protocol: URL port is lost during normalization
> -------------------------------------------------------------
>
>                 Key: NUTCH-2858
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2858
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, urlnormalizer
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.19
>
>
> If a URL includes a port, e.g. {{http://example.com:8080/}} or 
> {{https://example.com:8443/}}, the port is removed when normalizing using the 
> protocol-urlnormalizer.
> Instead, if the port is set,
> - the port should be kept as is and
> - the protocol should be unchanged
>    -* keeping the port and changing the protocol might result in a connection 
> failure
>    -* unlike the default port mappings (80 (http) <> 443 (https)), 
> non-default port mappings (8080 <> 8443) are risky and unlikely to work on 
> every server setup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to