IRIs

Andy Seaborne Sun, 31 Jan 2021 13:48:18 -0800

How strict should Jena IRI parsing be (of incoming files, not API calls)?

One observation: when using Fuseki, warnings end up in the log file ...and not everybody looks in the log file. If Fuseki is used as a service,the service operator isn't the data users - no one looks in the log filefor warnings.


So I think we are really talking about error vs good, not warnings.

Having been poking around the IRI support recently, there are someoddball cases.



1/ "abc%XXdef" (any URI scheme)

URIs with illegal % encodings like "abc%XXdef" (not hex digits) aresyntactically illegal by grammar of RFC3986.


Currently, passes with a warning.

2/ URNs: bad namespace identifier (it has to be at least two characters)so "urn:x:abcde" is illegal by the URN scheme.

3/ URNs: not having the second colon. A URN is urn:NID:NSS (NamespaceIdentifier ; Namespace Specific String). "urn:abcde" is bad.

Currently, 2 and 3 pass. jena-iri treats them as an errorsunconditionally (it can not be turned off or made a warning - there isspecial code to deal with this).


4/ "http:abc"

Plain weird. Strictly illegal (by https scheme rules) but RFC3986 allowsit to be resolved against a base and treated as relative URI "abc".




Are there other URI schemes with scheme specific rules?

Supported so far:

http, urn, urn:uuid:, uuid (never official), file:

What others are in use which have addition syntactic restrictions? (DIDsfor example). [Link to definitive document required]


    Andy

IRIs

Reply via email to