How strict should Jena IRI parsing be (of incoming files, not API calls)?
One observation: when using Fuseki, warnings end up in the log file ...
and not everybody looks in the log file. If Fuseki is used as a service,
the service operator isn't the data users - no one looks in the log file
for warnings.
So I think we are really talking about error vs good, not warnings.
Having been poking around the IRI support recently, there are some
oddball cases.
1/ "abc%XXdef" (any URI scheme)
URIs with illegal % encodings like "abc%XXdef" (not hex digits) are
syntactically illegal by grammar of RFC3986.
Currently, passes with a warning.
2/ URNs: bad namespace identifier (it has to be at least two characters)
so "urn:x:abcde" is illegal by the URN scheme.
3/ URNs: not having the second colon. A URN is urn:NID:NSS (Namespace
Identifier ; Namespace Specific String). "urn:abcde" is bad.
Currently, 2 and 3 pass. jena-iri treats them as an errors
unconditionally (it can not be turned off or made a warning - there is
special code to deal with this).
4/ "http:abc"
Plain weird. Strictly illegal (by https scheme rules) but RFC3986 allows
it to be resolved against a base and treated as relative URI "abc".
Are there other URI schemes with scheme specific rules?
Supported so far:
http, urn, urn:uuid:, uuid (never official), file:
What others are in use which have addition syntactic restrictions? (DIDs
for example). [Link to definitive document required]
Andy