[ 
https://issues.apache.org/jira/browse/CONNECTORS-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1155.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: ManifoldCF 1.8.2
                   ManifoldCF 2.0.2

> Web connector should not be sending the port number in request header field 
> Host
> --------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1155
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1155
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 1.7.2
>            Reporter: Denis Beck
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.0.2, ManifoldCF 1.8.2
>
>
> The web connector sends the port number in the request header field Host 
> (e.g. Host: www.apache.org:443). This causes redirect rules for the host name 
> to fail. The port number should not be part of the Host header.
> On the other hand RFC 2616 section 14.23 
> (http://tools.ietf.org/html/rfc2616#section-14.23) says “The Host 
> request-header field specifies the Internet host and port number of the 
> resource being requested [...]”.
> I encountered this issue while trying to crawl a customer’s website. The very 
> first call to the seed URL caused a redirect which contained a link to the 
> original URL itself and the job ended without fetching anything. The Simple 
> History showed Status 301, that's it. Maybe the web connector does not follow 
> the link in the redirect correctly?
> The redirect couldn't be triggered otherwise: I tried a browser and cURL. 
> ManifoldCF's web connector was the only one sending the port number with the 
> Host header and wasn't able to crawl the website due to this behavior.
> This issue could be worked around collaborating with the contractor which 
> hosted the customer's website. He added an exception for these requests. But 
> in general, I think this should be fixed, as such collaboration is not always 
> possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to