The Atom specification defines that IRI's can be used anywhere within Atom documents. Unfortunately, however, Java 1.5 and earlier does not include support for converting IRIs to URIs as necessary in order to get a dereferenceable URI. Currently we fake it by parsing out to URI, but that definitely has a number of problems.
For instance, consider the following feed: http://www.詹姆斯.com/feed (James Holderness' weblog) If I do: URI uri = new URI("http://www.詹姆斯.com/feed"); The URI will be created without throwing any errors, despite the fact that the unicode characters are not legal in a URI. Calling uri.toString() will return the URI. However, calling uri.getHost() on this URI improperly returns null. Calling uri.getAuthority() returns the host name, but if the URI also has a port specified, getAuthority() also returns the port (e.g. for "http://www.詹姆斯.com:80/feed" getAuthority() returns "www.詹姆斯.com:80" Worse yet, if I call uri.toASCIIString() the output from URI is http://www.%E8%A9%B9%E5%A7%86%E6%96%AF.com/feed, which is quite clearly wrong. Now, all of our (IBMs) implementations have ICU [1] available, which includes proper IDN support. It's a simple matter to write an IRI to URI converter.. Unfortunately, this is *really* slow and ICU is a big package (3.08M for the jar) and we really don't have need for the whole thing. It's fine for platforms that already have ICU, but requiring an additional 3.08M download so we can slowly convert and IRI to a URI really bugs. That said, however, I'm not sure how we can get around it. Even the Jena projects IRI implementation (generally considered by those more knowledgeable about this than I to be pretty good) depends on ICU. So, anyway, long story short: if we want proper support for IRIs (which we need) then we're going to have to introduce a dependency on ICU. I'm not happy about it, but I don't see any other way around it. Thoughts? - James [1] http://www-306.ibm.com/software/globalization/icu/index.jsp
