On Sun, 2012-03-25 at 14:19 -0400, Uncle wrote:
> > It is not HttpClient reporting a wrong response status. It is the server
> > behaving incorrectly. I get the same 404 when accessing the location
> > directly.
> 
> What do you mean "directly"?
> 

Without redirect.

> > The problem is that the server does not correctly handle URI
> > fragment (the #axzz1pdAzTzT2 bit). The HTTP spec does not explicitly
> > state how fragments in redirect locations should be handled. So, in my
> > opinion it is a server side issue. 
> 
> In my opinion, if 5 clients (HttpURLConnection, HttpClient, Chrome, Safari, 
> Firefox) try to hit the URL, and 4 of them do so successfully and one does 
> not, the issue is with the one client, not with the server.  Many URL's are 
> poorly formed or ambiguous, yet most clients take extra steps to access them, 
> which makes them more useful. 

HttpClient is not a browser but you are certainly entitled to have a
different opinion. 

>  I think that HttpClient should either do that or provide facilities for 
> doing so.
> 

It does. One can handle redirects differently by implementing a custom
RedirectStrategy and rewriting malformed redirect URIs in a way which is
acceptable in the context of a specific application 

> > The URL has illegal character(s), which is the reason why the redirect
> > fails. 
> 
> The Java toolkit and browsers URLEncode the URL, which avoids this problem. 
> This seems like a good general approach when redirecting.
> 

See above.

Oleg

> Randy
> 
> On Mar 24, 2012, at 7:59 PM, Oleg Kalnichevski wrote:
> 
> > On Sat, 2012-03-24 at 16:46 -0400, Uncle wrote:
> >> On Mar 24, 2012, at 2:48 PM, Oleg Kalnichevski wrote:
> >> 
> >>> On Sat, 2012-03-24 at 08:50 -0400, Uncle wrote:
> >>>> Apologies if this has been addressed, I searched the archives and was 
> >>>> unable to find anything directly relating to this, though it seems 
> >>>> straightforward.
> >>>> 
> >>>> I am trying to use httpclient to obtain the redirect URL for a url such 
> >>>> as http://bit.ly/GGviSv, but I am getting a 404 error.  This is a 
> >>>> "permanent" redirect (code 301).  This code:
> >>>> 
> >>>>       String url = "http://bit.ly/GGviSv";;
> >>>>       HttpGet httpget = new HttpGet(url);
> >>>>       HttpContext context = new BasicHttpContext();
> >>>>       HttpClient httpclient = new DefaultHttpClient();
> >>>> 
> >>>>       HttpResponse response = httpclient.execute(httpget, context);
> >>>> 
> >>>>       RedirectStrategy redirectStrategy = new DefaultRedirectStrategy();
> >>>> 
> >>>>       log.info("isRedirected = " + 
> >>>> redirectStrategy.isRedirected(httpget, response, context));
> >>>>       for(Header header : response.getAllHeaders())
> >>>>           log.info("header: " + header);
> >>>> 
> >>>>       log.info("status = " + response.getStatusLine());
> >>>> 
> >>>> outputs:
> >>>> 
> >>>> isRedirected = false
> >>>> header: Server: nginx
> >>>> header: Date: Sat, 24 Mar 2012 12:38:43 GMT
> >>>> header: Content-Type: text/html; charset=UTF-8                           
> >>>>                                                                          
> >>>>                       
> >>>> header: Transfer-Encoding: chunked
> >>>> header: Connection: keep-alive
> >>>> header: Vary: Cookie
> >>>> header: X-CF-Powered-By: WP 1.2.0
> >>>> header: X-Pingback: http://lavamagazine.com/xmlrpc.php
> >>>> header: Expires: Wed, 11 Jan 1984 05:00:00 GMT
> >>>> header: Last-Modified: Sat, 24 Mar 2012 12:38:43 GMT
> >>>> header: Cache-Control: no-cache, must-revalidate, max-age=0
> >>>> header: Pragma: no-cache
> >>>> status = HTTP/1.1 404 Not Found
> >>>> 
> >>>> I expected 1) isRedirected to be true, 2) the response code to be 301, 
> >>>> and/or 3) the destination URL to be in the headers where I could get it. 
> >>>>  However, if I ignore the 404 and continue getting the URL:
> >>>> 
> >>>>       HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute( 
> >>>> ExecutionContext.HTTP_REQUEST );
> >>>>       HttpHost currentHost = (HttpHost)  
> >>>> context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
> >>>>       String currentUrl = (currentReq.getURI().isAbsolute()) ? 
> >>>> currentReq.getURI().toString() : (currentHost.toURI() + 
> >>>> currentReq.getURI());
> >>>>       httpclient.getConnectionManager().shutdown();
> >>>>       log.info("Redirected URL = " + currentUrl);
> >>>> 
> >>>> This does the right thing and provides me with the correct URL.  So, why 
> >>>> the 404 error?  I am processing a large quantity of URL's and need to 
> >>>> accurately determine which ones are errors, redirects, etc.
> >>>> 
> >>>> Thanks for any assistance.
> >>>> 
> >>>> Randy
> >>>> 
> >>> 
> >>> As far as I can tell HttpClient correctly redirects to the new location,
> >>> but the resource is simply no longer there.
> >>> 
> >>> [DEBUG] headers - >> GET /GGviSv HTTP/1.1
> >>> [DEBUG] headers - >> Host: bit.ly
> >>> [DEBUG] headers - >> Connection: Keep-Alive
> >>> [DEBUG] headers - >> User-Agent: Apache-HttpClient/4.2-beta2-SNAPSHOT
> >>> (java 1.5)
> >>> [DEBUG] headers - << HTTP/1.1 301 Moved
> >>> [DEBUG] headers - << Server: nginx
> >>> [DEBUG] headers - << Date: Sat, 24 Mar 2012 18:46:44 GMT
> >>> [DEBUG] headers - << Content-Type: text/html; charset=utf-8
> >>> [DEBUG] headers - << Connection: keep-alive
> >>> [DEBUG] headers - << Set-Cookie:
> >>> _bit=4f6e1694-00156-016bf-3d1cf10a;domain=.bit.ly;expires=Thu Sep 20
> >>> 18:46:44 2012;path=/; HttpOnly
> >>> [DEBUG] headers - << Cache-control: private; max-age=90
> >>> [DEBUG] headers - << Location:
> >>> http://lavamagazine.com/features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2
> >>> [DEBUG] headers - << MIME-Version: 1.0
> >>> [DEBUG] headers - << Content-Length: 185
> >>> [DEBUG] headers - >>
> >>> GET 
> >>> /features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2 
> >>> HTTP/1.1
> >>> [DEBUG] headers - >> Host: lavamagazine.com
> >>> [DEBUG] headers - >> Connection: Keep-Alive
> >>> [DEBUG] headers - >> User-Agent: Apache-HttpClient/4.2-beta2-SNAPSHOT
> >>> (java 1.5)
> >>> [DEBUG] headers - << HTTP/1.1 404 Not Found
> >>> [DEBUG] headers - << Server: nginx
> >>> [DEBUG] headers - << Date: Sat, 24 Mar 2012 18:46:45 GMT
> >>> [DEBUG] headers - << Content-Type: text/html; charset=UTF-8
> >>> [DEBUG] headers - << Transfer-Encoding: chunked
> >>> [DEBUG] headers - << Connection: keep-alive
> >>> [DEBUG] headers - << Vary: Cookie
> >>> [DEBUG] headers - << X-CF-Powered-By: WP 1.2.0
> >>> [DEBUG] headers - << X-Pingback: http://lavamagazine.com/xmlrpc.php
> >>> [DEBUG] headers - << Expires: Wed, 11 Jan 1984 05:00:00 GMT
> >>> [DEBUG] headers - << Last-Modified: Sat, 24 Mar 2012 18:46:45 GMT
> >>> [DEBUG] headers - << Cache-Control: no-cache, must-revalidate, max-age=0
> >>> [DEBUG] headers - << Pragma: no-cache
> >>> 
> >>> Oleg
> >>> 
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>> 
> >> 
> >> Yet, if you hit the URL: 
> >> 
> >> http://lavamagazine.com/features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2
> >> 
> >> with your browser, the content comes up fine.  
> >> 
> >> Hitting the redirect URL with the standard Java HttpURLConnetion class 
> >> does not produce the 404:
> >> 
> >>       String url = "http://bit.ly/GGviSv";;
> >>        URL urlObj = new URL(url);
> >>        HttpURLConnection urlConnection = 
> >> (HttpURLConnection)urlObj.openConnection();
> >>        urlConnection.setRequestMethod("GET");
> >>        urlConnection.setConnectTimeout(15000);
> >>        urlConnection.setReadTimeout(30000);
> >>        urlConnection.connect();
> >>        log.info("Response code = " + urlConnection.getResponseCode());
> >>        InputStream inputStream = urlConnection.getInputStream();
> >>        log.info("Redirected URL = " + urlConnection.getURL().toString());
> >> 
> >> This outputs:
> >> 
> >> Response code = 200
> >> Redirected URL = 
> >> http://lavamagazine.com/features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2
> >> 
> >> So HttpClient reports a 404, but HttpURLConnection reports a 200 and my 
> >> browsers (Safari, Chrome, and FireFox) all hit the link fine.
> >> 
> > 
> > It is not HttpClient reporting a wrong response status. It is the server
> > behaving incorrectly. I get the same 404 when accessing the location
> > directly. The problem is that the server does not correctly handle URI
> > fragment (the #axzz1pdAzTzT2 bit). The HTTP spec does not explicitly
> > state how fragments in redirect locations should be handled. So, in my
> > opinion it is a server side issue. 
> > 
> > You can work the problem around by using a custom redirect strategy and
> > rewrites redirect location and strips away the fragment if present.
> > 
> > [DEBUG] headers - >>
> > GET /features/video-biking-the-ironman-melbourne-run-course/#axzz1pdAzTzT2 
> > HTTP/1.1
> > [DEBUG] headers - >> Host: lavamagazine.com
> > [DEBUG] headers - >> Connection: Keep-Alive
> > [DEBUG] headers - >> User-Agent: Apache-HttpClient/4.2-beta2-SNAPSHOT
> > (java 1.5)
> > [DEBUG] headers - << HTTP/1.1 404 Not Found
> > [DEBUG] headers - << Server: nginx
> > [DEBUG] headers - << Date: Sat, 24 Mar 2012 23:31:10 GMT
> > [DEBUG] headers - << Content-Type: text/html; charset=UTF-8
> > [DEBUG] headers - << Transfer-Encoding: chunked
> > [DEBUG] headers - << Connection: keep-alive
> > [DEBUG] headers - << Vary: Cookie
> > [DEBUG] headers - << X-CF-Powered-By: WP 1.2.0
> > [DEBUG] headers - << X-Pingback: http://lavamagazine.com/xmlrpc.php
> > [DEBUG] headers - << Expires: Wed, 11 Jan 1984 05:00:00 GMT
> > [DEBUG] headers - << Last-Modified: Sat, 24 Mar 2012 23:31:10 GMT
> > [DEBUG] headers - << Cache-Control: no-cache, must-revalidate, max-age=0
> > [DEBUG] headers - << Pragma: no-cache
> > 
> > 
> >> Here is another URL that is problematic:
> >> 
> >> http://on.wsj.com/GHGlfS
> >> 
> >> this produces:
> >> 
> >> org.apache.http.client.ClientProtocolException
> >>    at 
> >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:822)
> >>    at 
> >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> >> ... snip ...
> >> Caused by: org.apache.http.ProtocolException: Invalid redirect URI: 
> >> http://blogs.wsj.com/speakeasy/2012/03/22/coroner-rules-whitney-houstonĂ¢??s-death-an-accident/?mod=e2tw
> >>    at 
> >> org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:185)
> >>    at 
> >> org.apache.http.impl.client.DefaultRedirectStrategy.getLocationURI(DefaultRedirectStrategy.java:116)
> >>    at 
> >> org.apache.http.impl.client.DefaultRedirectStrategy.getRedirect(DefaultRedirectStrategy.java:193)
> >>    at 
> >> org.apache.http.impl.client.DefaultRequestDirector.handleResponse(DefaultRequestDirector.java:1035)
> >>    at 
> >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:492)
> >>    at 
> >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> >>    ... 28 more
> >> Caused by: java.net.URISyntaxException: Illegal character in path at index 
> >> 72: 
> >> http://blogs.wsj.com/speakeasy/2012/03/22/coroner-rules-whitney-houstonĂ¢??s-death-an-accident/?mod=e2tw
> >>    at java.net.URI$Parser.fail(URI.java:2809)
> >>    at java.net.URI$Parser.checkChars(URI.java:2982)
> >>    at java.net.URI$Parser.parseHierarchical(URI.java:3066)
> >>    at java.net.URI$Parser.parse(URI.java:3014)
> >>    at java.net.URI.<init>(URI.java:578)
> >>    at 
> >> org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:183)
> >>    ... 33 more
> >> 
> >> The redirected URL has a special character in it (single quote), and the 
> >> client doesn't handle that.  The Java code that I pasted above produces
> >> 
> > 
> > The URL has illegal character(s), which is the reason why the redirect
> > fails. 
> > 
> > Oleg
> > 
> >> Response code = 200
> >> Redirected URL = 
> >> http://blogs.wsj.com/speakeasy/2012/03/22/coroner-rules-whitney-houston%e2%80%99s-death-an-accident/?%3fs-death-an-accident/%3fmod=e2tw
> >> 
> >> Randy
> >> 
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >> 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to