jena-iri is difficult to maintain as Arne recently discovered. There is only so much tweaking that can be done to it. It was written to be standalone, general and modular. It covers several URI schemes that don't arise for RDF/linked data/knowledge graphs. It is out-of-date with more recent RFC revisions.

-- iri4ld

https://github.com/afs/x4ld/blob/main/iri4ld/README.md

iri4ld is an attempt at a IRI checker/parser that is more maintainable
and focused on linked data needs. It supports syntax parsing RFC3986/7 (i.e UTF-8 not restricted to ASCII) as a single pass and a single object allocation; there is no regular expression usage in syntax parsing. It has no runtime dependencies.

It also provides URI scheme validation for the latest RFCs for:

    http:, https, did:, file:,
    urn:, urn:uuid:, urn:oid:, example:
and non-standard schemes
    uuid: and oid:

More can be added as a code change.

Unknown URIs schemes are not rejected - they get the syntax parsing but no further validation.

If anyone works with data with other URI schemes, please email with a link to the definition.

-- iri4ld in Jena

Jena has an comprehensive IRIx test suite that covers the expectations. It captures user feedback based on what users encounter.

IRIProvider is the Jena adapter interface for an IRI library implementation [1]. Like jena-iri, error and warning handling of iri4ld can be configured. A provider can tuned and adapt the exposed behaviour of the IRI library.

In Jena, all scheme specific violations become RIOT warnings (as before). Only syntax errors for IRIs are RIOT parser errors - these cause problems when writing legal RDF data.

iri4ld has different text for messages (there is a way to use jena-iri messages but IMO the jena-iri messages are a bit cryptic). This is the biggest change.

-- Current status

iri4ld passes the Jena test suite except it has thrown up a one area to investigate. Should UTF-8 characters in URNs be allowed? [2]

There will be further checking to do - this is a change to something that underpins a lot of the system.

Changeing should happen at the beginning of a development cycle.

-- What to do with jena-iri?

iri4ld uses jena-iri in its test suite to compare (with a lot of noted exceptions).

jena-iri could be moved, and the package and module name reused for iri4ld, or we could have a new jena-<somename> and leave jena-iri in place or we could retire jena-iri (then the new IRI code would not test against it but by now the iri4ld test suite and the Jena IRIx test suites should be covering everything).

    Andy

[1] The JDK URI parsing of java.net.URI is very out of date. It is RFC 2396, with some tweaks. There is already a IRIProviderJDK for java.net.URI. The Jena test suite shows it is a long way off.

[2] Background; URNs are strictly ASCII. The key question for linked data usage is whether to allow UTF-8 beyond ASCII in URNs general. There are arguments for and against.

Reply via email to