Hi, > I just tried it with telnet and can confirm the redirects, though > note that the redirect does not happen if you exclude the :80 or > :443 port information from Host. > > It would appear that those sites have been configured to > canonicalize the URI used for access, and thus are directing > your client not to include the port info in Host. RFC 2616 > section 14.23 Host: > > The Host request-header field specifies the Internet host and port > number of the resource being requested, as obtained from the original > URI given by the user or referring resource (generally an HTTP URL, > as described in section 3.2.2). The Host field value MUST represent > the naming authority of the origin server or gateway given by the > original URL.
thank you for this hint. I changed the client here, and it seems to work better now. But I disagree with your interpretation of the specification. 14.23 says "port number [...] as obtained from the original URI", and 3.2.2 says "If the port is empty or not given, port 80 is assumed". So the client "obtained" the port number by "assuming" that the standard port is meant. Additionally, 3.2.3 (URI comparison) says: "[...] octet-by-octet comparison of the entire URIs, with these exceptions: A port that is empty or not given is equivalent to the default port for that URI-reference [...]". So the Apache httpd redirects to an *equivalent* URI while the Location header field would only allow a redirection to something "*other* than the Request-URI" (14.30). I read "equivalent" to mean "basically the same", not "other". All this is why I originally implemented it this way. Maybe I misinterpreted the spec. Should it be clarified with an erratum, or am I just completely unreasonable? As mentioned earlier, the problem only occurred with Apache httpd, not with other server programs. > Also, you should be > sending HTTP/1.1 requests, not HTTP/1.0. The client is able to do this, but I experienced problems with the HTTP/1.1 implementation of some httpd servers. So the client uses HTTP/1.0 by default nowadays, and users can configure it to use HTTP/1.1 for servers where they want it. Sometimes, the problems are plain bugs where a server sends garbage. But there are also performance problems; for example, I just revisited <http://directory.fsf.org:80/gcc.html>, which seems to use the (admittedly old) Apache httpd 1.3.26 and sends some chunks with sizes in the range 20..50. I found cases where an httpd sent literally hundreds of such ridiculously small chunks for one single resource, but this might not (or not only) have been Apache httpd and I'm currently not able to recover examples. Best wishes, Arne -- www.arne-thomassen.de
