Pascal Christoph created JENA-457:
-------------------------------------
Summary: ntriples: Object-URIs should be %-encoded
Key: JENA-457
URL: https://issues.apache.org/jira/browse/JENA-457
Project: Apache Jena
Issue Type: Improvement
Components: ARQ, Jena, RDF API
Affects Versions: ARQ 2.9.3
Environment: everywhere
Reporter: Pascal Christoph
Priority: Minor
Ntriple serialization is in pure ASCII for now[1] , so IRIs are not possible as
UTF8 is not allowed (see rfc3987). Serializing a Model to ntriples escapes
non-ASCII characters with '\u' escaping. These URIs don't resolve in most cases
per se, e.g. in dbpedia. These are the three different notations possible:
1. http://de.dbpedia.org/resource/T\u00FCr
2. http://de.dbpedia.org/resource/T%fcr
3. http://de.dbpedia.org/resource/Tür
While the 1. doesn't resolve and the 3. is not ASCII, the 2. (the percent-octet
encoding) fulfills both requirements. So I would like to see the use of the 2.
to encode object URIs in ASCII ntriple serialization. See also
https://answers.semanticweb.com/questions/18508/best-way-to-encode-uri-refsiris-for-n-triples
.
One could use jena to serialize as turtle and transform this turtle file to
ntriples with rapper. But rapper encodes all literals having
unicode-escape-sequences to utf8 ignoring the transformation of URIs (wisely,
since they are identifier). So this does not help.
Some concrete code which is responsible for this serialization:
RDFWriter fasterWriter = model.getWriter("N-TRIPLE");
Should be save to apply a patch like this in NTripleWriter.java:
private static void writeURIString(String s, PrintWriter writer) {
writer.print(org.apache.commons.httpclient.util.URIUtil.encodeQuery(s) ) ;
}
(not tested)
What do you think?
-o
[1]see a month old note from W3C where it is proposed to use utf-8 instead of
ASCII : http://www.w3.org/TR/2013/NOTE-n-triples-20130409/#n-triple-changes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira