On Mon, 2011-10-24 at 14:15 +1100, Jack Hatch wrote: > Hey all, > > Bit of a weird one. I'm using HTTPClient 4.1.2, and it seems that whenever > it finds are URL with something like a '#' in it, it does a full get with > the # in the URL. > > For example, trying to get the URL http://stks.co/eWt will redirect to the > URL > http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter. > Now this URL is live, but the problem is the HTTPClient sends a get request > with the URI set to URI: > /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitterwhich > causes the server to send back a 404 page not found. > > Looking at the GET sent by IE, Firefox and cURL, they all strip out the #... > from the end of the URI, so for example the cURL GET request URI is set as > URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/ - > all the #... have been removed. This is for the exact same entry URL of > http://stks.co/eWt. > > As a test, sending this raw URL into HTTPClient (i.e. HttpGet httpget = new > HttpGet(" > http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter > ");) gives the same 404 not found result. > The issue is I dont know if the url has an #anchor in it, as it from a short > URL service... > > So the question is are there any settings in HTTPClient that can be set so > that things like the trailing #... can be auto removed from URLs. Or how > would I go about manually removing this from URLs (remember that I would > need to capture all redirect URLs as well)? > > Cheers!
You can use a custom RedirectStrategy and reformat / modify redirect locations as you see fit. Most likely all you need is to subclass the DefaultRedirectStrategy and override its #createLocationURI method. Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
