I could be accused of having gone overboard ... each of the slightly different specs as an explicit representation in the code ... Having changed job and looking at this from a different perspective I am less convinced by the pickiness. There is a largely unrealized goal in the IRI code to link errors right back to the specs so that the error messages quote chapter and verse.

Jeremy


On 11/9/2010 8:52 AM, Andy Seaborne wrote:


On 09/11/10 16:22, Florent Guillaume wrote:
On Tue, Nov 9, 2010 at 1:19 PM, Andy Seaborne
<andy.seabo...@epimorphics.com>  wrote:
Jeremy identified the IRI library as a potential contribution to a commons area. It is free-standing, and does not use or call any Jena RDF code - it
depends only on ICU4J (and JUnit + Jflex in the build).

Please note that Abdera already has an IRI library.
http://svn.apache.org/repos/asf/abdera/java/trunk/dependencies/i18n/src/main/java/org/apache/abdera/i18n/iri/

Florent,

Thanks for pointing that out. I see it has a test suite as well and it would be good to make sure we've got things right.

Illegal IRIs in data have been a bit of a plague in RDF data and the IRI library (written by Jeremy) is a response to that. It checks both rules for specific IRI schemes and also recommended forms as IRIs are often com pared for equality. The library is quite picky. It includes profiles for RDF URI references, IRI and the compromise we use in Jena as a balance of legacy and spec exactness.

There is an online test service for RDF data in non-RDF/XML formats at:

http://sparql.org/data-validator.html

The IRIs are checked with the IRI library.

    Andy

A few examples:

http://example/a b

Code: 17/WHITESPACE in PATH: A single whitespace character. These match no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.

http://example/a[]b

Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.

http://example:80/

Code: 13/DEFAULT_PORT_SHOULD_BE_OMITTED in PORT: If the port is the default one for the scheme it should be omitted. <http://example:80/> Code: 14/PORT_SHOULD_NOT_BE_WELL_KNOWN in PORT: Ports under 1024 should be accessed using the appropriate scheme name

urn:xyz

Code: 61/SCHEME_PATTERN_MATCH_FAILED in PATH: The scheme specific syntax rules are violated.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to