On Jun 4, 2012, at 5:41pm, Mugoma Joseph Okomba wrote:
> Hello,
>
> While trying to use HttpClient 4.2 to download page I am getting:
>
> java.net.URISyntaxException: Illegal character in query at index 85:
> http://www.target.com/webapp/wcs/stores/servlet/s?searchTerm=Jared+Diamond&category=0|All|matchallany|all+categories
>
>
> On HttpClient 3.x I get similar error:
>
> java.lang.IllegalArgumentException: Invalid uri
> 'http://www.target.com/webapp/wcs/stores/servlet/s?searchTerm=Jared+Diamond&category=0|All|matchallany|all+categories':
> Invalid query
>
>
> However using the native Java download causes no error:
>
> URL getURL = new URL(url);
> HttpURLConnection huc = ( HttpURLConnection ) getURL.openConnection ();
> huc.setRequestMethod("GET");
> InputStream inps = null;
> try{
> huc.connect();
> inps = (InputStream) huc.getInputStream();
> }
>
>
> The URL is valid and accessible. How can one make HttpClient resolve such
> URL?
This issue is one that has come up on occasion in the past, where the Java.net
URI class is more restrictive than the URL class, or most browsers, or most DNS
software.
In your case it's failing because '|' (vertical bar) is not considered a valid
character by Java's URI class (which is used internally by HttpClient), but it
is OK for a URL. Which always struck me as odd, since most people talk about
URLs being a subset of URIs :)
Going back in time, RFC1630 (T. Berners-Lee, CERN 1994) classifies the vertical
bar (called "vline" in the spec) as a "national" character:
national { | } | vline | [ | ] | \ | ^ | ~
And then says:
The "national" and "punctuation" characters do not appear in any productions
and therefore may not appear in URIs. So technically speaking the URI class is
doing the right thing.
You'll run into a similar issue with subdomains that use '-', e.g.
-angelcries.blogspot.com can be used to construct a URL, but not a URI.
Because DNS software & browsers are permissive, you'll find a number of these
cases where web pages can't be fetched using HttpClient.
-- Ken
--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378
--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr