On 15/06/14 18:18, Ying Jiang wrote:
1) space and other non-URI characters in column name
I introduce the LangCSV.encodeURIComponent() method borrowed from [1].
However it does not strictly conform to RFC 3986 [2].
TestLangCSV.testNonURICharacters() [7] shows the escaping result.
There's also another related standard of RFC 2396 [3]. I'm confused by
them.

RFC 2396 is superseded by RFC 3986.

Which one is Jena URI supposed to stick to?
There're other escaping method from libs, such as spring-web [4],
guava [5] and the old commons-httpclient [6]. Is it OK to make Jena
(jena-arq) depending on one of these libs?


Jena has some IRI code that may be useful to you:

// Includes punycode for host names!
IRI iri = IRIFactory.iriImplementation()
                    .create("http://examplé/foo bar?query=a b") ;
System.out.println(iri.toASCIIString()) ;

iri = IRIFactory.iriImplementation()
                    .create("foo bar?query=a b") ;
System.out.println(iri.toASCIIString()) ;

It's not query-string sensitive, "a b" becomes "a%20b" and not "a+b", but for producing URIs in the CSV case that does not matter (?).

You'll need to be careful about '?' anyway as you'll need to specially %-encode it.

Jena already depends on org.apache.httpcomponents.httpclient so that is no extra dependency.

        Andy

Reply via email to