Hey all, Bit of a weird one. I'm using HTTPClient 4.1.2, and it seems that whenever it finds are URL with something like a '#' in it, it does a full get with the # in the URL.
For example, trying to get the URL http://stks.co/eWt will redirect to the URL http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter. Now this URL is live, but the problem is the HTTPClient sends a get request with the URI set to URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitterwhich causes the server to send back a 404 page not found. Looking at the GET sent by IE, Firefox and cURL, they all strip out the #... from the end of the URI, so for example the cURL GET request URI is set as URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/ - all the #... have been removed. This is for the exact same entry URL of http://stks.co/eWt. As a test, sending this raw URL into HTTPClient (i.e. HttpGet httpget = new HttpGet(" http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter ");) gives the same 404 not found result. The issue is I dont know if the url has an #anchor in it, as it from a short URL service... So the question is are there any settings in HTTPClient that can be set so that things like the trailing #... can be auto removed from URLs. Or how would I go about manually removing this from URLs (remember that I would need to capture all redirect URLs as well)? Cheers!
