[
https://issues.apache.org/jira/browse/CONNECTORS-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright resolved CONNECTORS-1155.
-------------------------------------
Resolution: Fixed
Fix Version/s: ManifoldCF 1.8.2
ManifoldCF 2.0.2
> Web connector should not be sending the port number in request header field
> Host
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-1155
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1155
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Affects Versions: ManifoldCF 1.7.2
> Reporter: Denis Beck
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.0.2, ManifoldCF 1.8.2
>
>
> The web connector sends the port number in the request header field Host
> (e.g. Host: www.apache.org:443). This causes redirect rules for the host name
> to fail. The port number should not be part of the Host header.
> On the other hand RFC 2616 section 14.23
> (http://tools.ietf.org/html/rfc2616#section-14.23) says “The Host
> request-header field specifies the Internet host and port number of the
> resource being requested [...]”.
> I encountered this issue while trying to crawl a customer’s website. The very
> first call to the seed URL caused a redirect which contained a link to the
> original URL itself and the job ended without fetching anything. The Simple
> History showed Status 301, that's it. Maybe the web connector does not follow
> the link in the redirect correctly?
> The redirect couldn't be triggered otherwise: I tried a browser and cURL.
> ManifoldCF's web connector was the only one sending the port number with the
> Host header and wasn't able to crawl the website due to this behavior.
> This issue could be worked around collaborating with the contractor which
> hosted the customer's website. He added an exception for these requests. But
> in general, I think this should be fixed, as such collaboration is not always
> possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)