How strict should Jena IRI parsing be (of incoming files, not API calls)?

One observation: when using Fuseki, warnings end up in the log file ... and not everybody looks in the log file. If Fuseki is used as a service, the service operator isn't the data users - no one looks in the log file for warnings.

So I think we are really talking about error vs good, not warnings.



Having been poking around the IRI support recently, there are some oddball cases.


1/ "abc%XXdef" (any URI scheme)

URIs with illegal % encodings like "abc%XXdef" (not hex digits) are syntactically illegal by grammar of RFC3986.

Currently, passes with a warning.


2/ URNs: bad namespace identifier (it has to be at least two characters) so "urn:x:abcde" is illegal by the URN scheme.

3/ URNs: not having the second colon. A URN is urn:NID:NSS (Namespace Identifier ; Namespace Specific String). "urn:abcde" is bad.

Currently, 2 and 3 pass. jena-iri treats them as an errors unconditionally (it can not be turned off or made a warning - there is special code to deal with this).

4/ "http:abc"

Plain weird. Strictly illegal (by https scheme rules) but RFC3986 allows it to be resolved against a base and treated as relative URI "abc".



Are there other URI schemes with scheme specific rules?

Supported so far:

http, urn, urn:uuid:, uuid (never official), file:

What others are in use which have addition syntactic restrictions? (DIDs for example). [Link to definitive document required]

    Andy

Reply via email to