Ken Krugler wrote:
I'm wondering if anybody else has encountered this problem...

I've got a funky URL: "http://-angelcries.blogspot.com/";

Note the leading '-' in the subdomain.

This works fine in Firefox, and gives no problem with the URL class.

But the URI class throws a URISyntaxException if you use a host name with a leading '-'.

I've read through the relevant portions of RFCs 3986, 2396, 1123, 1034 and 952. The general trend has been for more lenient domain name specification over time, but even today subdomains and domains should not start with '-'. However it's clear that DNS servers allow the use of a leading dash for subdomains.

Since HttpClient is often used to fetch content that is browsed by users, it would be an admirable goal to work around this problem - but the only solution I see is to use a custom URI class instead of what's in the JDK.

Based on what I see in other projects (e.g. Tomcat) this process of replacing default implementations with custom versions winds up being a path that's often taken, unfortunately, due to issues like this one.


Ken

(1) HttpClient 3.x has its own URI implementation. Sadly, it happened to be the ugliest and most troublesome area of the entire project, no one was willing to work or even do minimal maintenance on. This is the reason why it got replaced with the standard Java URI implementation in HttpClient 4.x.

(2) It is simply not possible to replace java.net.URI with something else without causing a major API breakage. I personally do not think it is worth it.

It should be possible, though, to investigate the feasibility of replacing the standard URI parsing routine with a more lenient one

Web crawlers that need to be able to handle non-standard or broken URIs as well as tolerate other non-standard behaviors might be much better off using HttpCore directly, possibly re-using the connection management components from HttpClient.

Oleg

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to