On Wed, 2011-06-29 at 12:21 +0200, khiem nguyen wrote: > Hi, i tried to retrieve the content of this link: > > http://de.tommy.com//Sale/600000,de_DE,sc.html > > > & got circular redirect, logging tells me that httpclient fires : GET > /Sale/600000,de_DE,sc.html > server response with redirect back to > http://de.tommy.com//Sale/600000,de_DE,sc.html > > wget behaves like browser & gives back the content. > > > with telnet: > > > telnet de.tommy.com 80 > Trying 89.202.105.72... > Connected to de.tommy.com. > Escape character is '^]'. > GET /Sale/600000,de_DE,sc.html HTTP/1.1 > Host:de.tommy.com > > HTTP/1.1 301 Moved Permanently > Date: Wed, 29 Jun 2011 10:11:15 GMT > Server: Apache > Content-Length: 0 > Set-Cookie: dwsid= > CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==; > path=/; HttpOnly > Cache-Control: no-cache,no-store,must-revalidate > Pragma: no-cache > Expires: Thu, 01 Dec 1994 16:00:00 GMT > Location: http://de.tommy.com//Sale/600000,de_DE,sc.html > Vary: Accept-Encoding > Accept-Ranges: bytes > Content-Type: text/plain > > Connection closed by foreign host. > ----- > > > de.tommy.com 80 > Trying 89.202.105.72... > Connected to de.tommy.com. > Escape character is '^]'. > GET //Sale/600000,de_DE,sc.html HTTP/1.1 > Host: de.tommy.com > > HTTP/1.1 200 OK > Date: Wed, 29 Jun 2011 10:07:11 GMT > Server: Apache > Set-Cookie: .... > ....content > > > ... > > seems like httpclient strip out one of the 2 slashes. > is it a bug or the server is misconfigured ( i guess they use rewrite or > something but its not rare) > > how can i fix this ? > thanx
The redirect returned by the server is malformed http://www.ietf.org/rfc/rfc2396.txt --- 3.3. Path Component The path component contains data, specific to the authority (or the scheme if there is no authority component), identifying the resource within the scope of that scheme and authority. path = [ abs_path | opaque_part ] path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | "," The path may consist of a sequence of path segments separated by a single slash "/" character. Within a path segment, the characters "/", ";", "=", and "?" are reserved. Each path segment may include a sequence of parameters, indicated by the semicolon ";" character. The parameters are not significant to the parsing of relative references. --- The path element of the URI is not supposed to have multiple consecutive slashes. Such URIs are ambiguous and whichever way HttpClient tries to normalize them it cannot get it right all the time. You have two options here: turning off automatic redirect and handling redirects manually or building a custom RedirectStrategy. Hope this helps Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
