> It is not HttpClient reporting a wrong response status. It is the server
> behaving incorrectly. I get the same 404 when accessing the location
> directly.

What do you mean "directly"?

> The problem is that the server does not correctly handle URI
> fragment (the #axzz1pdAzTzT2 bit). The HTTP spec does not explicitly
> state how fragments in redirect locations should be handled. So, in my
> opinion it is a server side issue. 

In my opinion, if 5 clients (HttpURLConnection, HttpClient, Chrome, Safari, 
Firefox) try to hit the URL, and 4 of them do so successfully and one does not, 
the issue is with the one client, not with the server.  Many URL's are poorly 
formed or ambiguous, yet most clients take extra steps to access them, which 
makes them more useful.  I think that HttpClient should either do that or 
provide facilities for doing so.

> The URL has illegal character(s), which is the reason why the redirect
> fails. 

The Java toolkit and browsers URLEncode the URL, which avoids this problem. 
This seems like a good general approach when redirecting.

Randy

On Mar 24, 2012, at 7:59 PM, Oleg Kalnichevski wrote:

> On Sat, 2012-03-24 at 16:46 -0400, Uncle wrote:
>> On Mar 24, 2012, at 2:48 PM, Oleg Kalnichevski wrote:
>> 
>>> On Sat, 2012-03-24 at 08:50 -0400, Uncle wrote:
>>>> Apologies if this has been addressed, I searched the archives and was 
>>>> unable to find anything directly relating to this, though it seems 
>>>> straightforward.
>>>> 
>>>> I am trying to use httpclient to obtain the redirect URL for a url such as 
>>>> http://bit.ly/GGviSv, but I am getting a 404 error.  This is a "permanent" 
>>>> redirect (code 301).  This code:
>>>> 
>>>>       String url = "http://bit.ly/GGviSv";;
>>>>       HttpGet httpget = new HttpGet(url);
>>>>       HttpContext context = new BasicHttpContext();
>>>>       HttpClient httpclient = new DefaultHttpClient();
>>>> 
>>>>       HttpResponse response = httpclient.execute(httpget, context);
>>>> 
>>>>       RedirectStrategy redirectStrategy = new DefaultRedirectStrategy();
>>>> 
>>>>       log.info("isRedirected = " + redirectStrategy.isRedirected(httpget, 
>>>> response, context));
>>>>       for(Header header : response.getAllHeaders())
>>>>           log.info("header: " + header);
>>>> 
>>>>       log.info("status = " + response.getStatusLine());
>>>> 
>>>> outputs:
>>>> 
>>>> isRedirected = false
>>>> header: Server: nginx
>>>> header: Date: Sat, 24 Mar 2012 12:38:43 GMT
>>>> header: Content-Type: text/html; charset=UTF-8                             
>>>>                                                                            
>>>>                   
>>>> header: Transfer-Encoding: chunked
>>>> header: Connection: keep-alive
>>>> header: Vary: Cookie
>>>> header: X-CF-Powered-By: WP 1.2.0
>>>> header: X-Pingback: http://lavamagazine.com/xmlrpc.php
>>>> header: Expires: Wed, 11 Jan 1984 05:00:00 GMT
>>>> header: Last-Modified: Sat, 24 Mar 2012 12:38:43 GMT
>>>> header: Cache-Control: no-cache, must-revalidate, max-age=0
>>>> header: Pragma: no-cache
>>>> status = HTTP/1.1 404 Not Found
>>>> 
>>>> I expected 1) isRedirected to be true, 2) the response code to be 301, 
>>>> and/or 3) the destination URL to be in the headers where I could get it.  
>>>> However, if I ignore the 404 and continue getting the URL:
>>>> 
>>>>       HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute( 
>>>> ExecutionContext.HTTP_REQUEST );
>>>>       HttpHost currentHost = (HttpHost)  
>>>> context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
>>>>       String currentUrl = (currentReq.getURI().isAbsolute()) ? 
>>>> currentReq.getURI().toString() : (currentHost.toURI() + 
>>>> currentReq.getURI());
>>>>       httpclient.getConnectionManager().shutdown();
>>>>       log.info("Redirected URL = " + currentUrl);
>>>> 
>>>> This does the right thing and provides me with the correct URL.  So, why 
>>>> the 404 error?  I am processing a large quantity of URL's and need to 
>>>> accurately determine which ones are errors, redirects, etc.
>>>> 
>>>> Thanks for any assistance.
>>>> 
>>>> Randy
>>>> 
>>> 
>>> As far as I can tell HttpClient correctly redirects to the new location,
>>> but the resource is simply no longer there.
>>> 
>>> [DEBUG] headers - >> GET /GGviSv HTTP/1.1
>>> [DEBUG] headers - >> Host: bit.ly
>>> [DEBUG] headers - >> Connection: Keep-Alive
>>> [DEBUG] headers - >> User-Agent: Apache-HttpClient/4.2-beta2-SNAPSHOT
>>> (java 1.5)
>>> [DEBUG] headers - << HTTP/1.1 301 Moved
>>> [DEBUG] headers - << Server: nginx
>>> [DEBUG] headers - << Date: Sat, 24 Mar 2012 18:46:44 GMT
>>> [DEBUG] headers - << Content-Type: text/html; charset=utf-8
>>> [DEBUG] headers - << Connection: keep-alive
>>> [DEBUG] headers - << Set-Cookie:
>>> _bit=4f6e1694-00156-016bf-3d1cf10a;domain=.bit.ly;expires=Thu Sep 20
>>> 18:46:44 2012;path=/; HttpOnly
>>> [DEBUG] headers - << Cache-control: private; max-age=90
>>> [DEBUG] headers - << Location:
>>> http://lavamagazine.com/features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2
>>> [DEBUG] headers - << MIME-Version: 1.0
>>> [DEBUG] headers - << Content-Length: 185
>>> [DEBUG] headers - >>
>>> GET /features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2 
>>> HTTP/1.1
>>> [DEBUG] headers - >> Host: lavamagazine.com
>>> [DEBUG] headers - >> Connection: Keep-Alive
>>> [DEBUG] headers - >> User-Agent: Apache-HttpClient/4.2-beta2-SNAPSHOT
>>> (java 1.5)
>>> [DEBUG] headers - << HTTP/1.1 404 Not Found
>>> [DEBUG] headers - << Server: nginx
>>> [DEBUG] headers - << Date: Sat, 24 Mar 2012 18:46:45 GMT
>>> [DEBUG] headers - << Content-Type: text/html; charset=UTF-8
>>> [DEBUG] headers - << Transfer-Encoding: chunked
>>> [DEBUG] headers - << Connection: keep-alive
>>> [DEBUG] headers - << Vary: Cookie
>>> [DEBUG] headers - << X-CF-Powered-By: WP 1.2.0
>>> [DEBUG] headers - << X-Pingback: http://lavamagazine.com/xmlrpc.php
>>> [DEBUG] headers - << Expires: Wed, 11 Jan 1984 05:00:00 GMT
>>> [DEBUG] headers - << Last-Modified: Sat, 24 Mar 2012 18:46:45 GMT
>>> [DEBUG] headers - << Cache-Control: no-cache, must-revalidate, max-age=0
>>> [DEBUG] headers - << Pragma: no-cache
>>> 
>>> Oleg
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>> 
>> Yet, if you hit the URL: 
>> 
>> http://lavamagazine.com/features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2
>> 
>> with your browser, the content comes up fine.  
>> 
>> Hitting the redirect URL with the standard Java HttpURLConnetion class does 
>> not produce the 404:
>> 
>>       String url = "http://bit.ly/GGviSv";;
>>        URL urlObj = new URL(url);
>>        HttpURLConnection urlConnection = 
>> (HttpURLConnection)urlObj.openConnection();
>>        urlConnection.setRequestMethod("GET");
>>        urlConnection.setConnectTimeout(15000);
>>        urlConnection.setReadTimeout(30000);
>>        urlConnection.connect();
>>        log.info("Response code = " + urlConnection.getResponseCode());
>>        InputStream inputStream = urlConnection.getInputStream();
>>        log.info("Redirected URL = " + urlConnection.getURL().toString());
>> 
>> This outputs:
>> 
>> Response code = 200
>> Redirected URL = 
>> http://lavamagazine.com/features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2
>> 
>> So HttpClient reports a 404, but HttpURLConnection reports a 200 and my 
>> browsers (Safari, Chrome, and FireFox) all hit the link fine.
>> 
> 
> It is not HttpClient reporting a wrong response status. It is the server
> behaving incorrectly. I get the same 404 when accessing the location
> directly. The problem is that the server does not correctly handle URI
> fragment (the #axzz1pdAzTzT2 bit). The HTTP spec does not explicitly
> state how fragments in redirect locations should be handled. So, in my
> opinion it is a server side issue. 
> 
> You can work the problem around by using a custom redirect strategy and
> rewrites redirect location and strips away the fragment if present.
> 
> [DEBUG] headers - >>
> GET /features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2 
> HTTP/1.1
> [DEBUG] headers - >> Host: lavamagazine.com
> [DEBUG] headers - >> Connection: Keep-Alive
> [DEBUG] headers - >> User-Agent: Apache-HttpClient/4.2-beta2-SNAPSHOT
> (java 1.5)
> [DEBUG] headers - << HTTP/1.1 404 Not Found
> [DEBUG] headers - << Server: nginx
> [DEBUG] headers - << Date: Sat, 24 Mar 2012 23:31:10 GMT
> [DEBUG] headers - << Content-Type: text/html; charset=UTF-8
> [DEBUG] headers - << Transfer-Encoding: chunked
> [DEBUG] headers - << Connection: keep-alive
> [DEBUG] headers - << Vary: Cookie
> [DEBUG] headers - << X-CF-Powered-By: WP 1.2.0
> [DEBUG] headers - << X-Pingback: http://lavamagazine.com/xmlrpc.php
> [DEBUG] headers - << Expires: Wed, 11 Jan 1984 05:00:00 GMT
> [DEBUG] headers - << Last-Modified: Sat, 24 Mar 2012 23:31:10 GMT
> [DEBUG] headers - << Cache-Control: no-cache, must-revalidate, max-age=0
> [DEBUG] headers - << Pragma: no-cache
> 
> 
>> Here is another URL that is problematic:
>> 
>> http://on.wsj.com/GHGlfS
>> 
>> this produces:
>> 
>> org.apache.http.client.ClientProtocolException
>>      at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:822)
>>      at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>> ... snip ...
>> Caused by: org.apache.http.ProtocolException: Invalid redirect URI: 
>> http://blogs.wsj.com/speakeasy/2012/03/22/coroner-rules-whitney-houstonĂ¢??s-death-an-accident/?mod=e2tw
>>      at 
>> org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:185)
>>      at 
>> org.apache.http.impl.client.DefaultRedirectStrategy.getLocationURI(DefaultRedirectStrategy.java:116)
>>      at 
>> org.apache.http.impl.client.DefaultRedirectStrategy.getRedirect(DefaultRedirectStrategy.java:193)
>>      at 
>> org.apache.http.impl.client.DefaultRequestDirector.handleResponse(DefaultRequestDirector.java:1035)
>>      at 
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:492)
>>      at 
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>>      ... 28 more
>> Caused by: java.net.URISyntaxException: Illegal character in path at index 
>> 72: 
>> http://blogs.wsj.com/speakeasy/2012/03/22/coroner-rules-whitney-houstonĂ¢??s-death-an-accident/?mod=e2tw
>>      at java.net.URI$Parser.fail(URI.java:2809)
>>      at java.net.URI$Parser.checkChars(URI.java:2982)
>>      at java.net.URI$Parser.parseHierarchical(URI.java:3066)
>>      at java.net.URI$Parser.parse(URI.java:3014)
>>      at java.net.URI.<init>(URI.java:578)
>>      at 
>> org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:183)
>>      ... 33 more
>> 
>> The redirected URL has a special character in it (single quote), and the 
>> client doesn't handle that.  The Java code that I pasted above produces
>> 
> 
> The URL has illegal character(s), which is the reason why the redirect
> fails. 
> 
> Oleg
> 
>> Response code = 200
>> Redirected URL = 
>> http://blogs.wsj.com/speakeasy/2012/03/22/coroner-rules-whitney-houston%e2%80%99s-death-an-accident/?%3fs-death-an-accident/%3fmod=e2tw
>> 
>> Randy
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to