I completely forgot about:
IRILibh.encodeUriComponent(String)
Andy
On 19/06/14 11:09, Andy Seaborne wrote:
On 15/06/14 18:18, Ying Jiang wrote:
1) space and other non-URI characters in column name
I introduce the LangCSV.encodeURIComponent() method borrowed from [1].
However it does not strictly conform to RFC 3986 [2].
TestLangCSV.testNonURICharacters() [7] shows the escaping result.
There's also another related standard of RFC 2396 [3]. I'm confused by
them.
RFC 2396 is superseded by RFC 3986.
Which one is Jena URI supposed to stick to?
There're other escaping method from libs, such as spring-web [4],
guava [5] and the old commons-httpclient [6]. Is it OK to make Jena
(jena-arq) depending on one of these libs?
Jena has some IRI code that may be useful to you:
// Includes punycode for host names!
IRI iri = IRIFactory.iriImplementation()
.create("http://examplé/foo bar?query=a b") ;
System.out.println(iri.toASCIIString()) ;
iri = IRIFactory.iriImplementation()
.create("foo bar?query=a b") ;
System.out.println(iri.toASCIIString()) ;
It's not query-string sensitive, "a b" becomes "a%20b" and not "a+b",
but for producing URIs in the CSV case that does not matter (?).
You'll need to be careful about '?' anyway as you'll need to specially
%-encode it.
Jena already depends on org.apache.httpcomponents.httpclient so that is
no extra dependency.
Andy