RIOT / jena-core integration

Andy Seaborne Mon, 16 Jul 2012 10:19:50 -0700

I've been working (in my ASF scratch area) on a new I/O subsystem toreplace the current one in Jena.

+ Replaces Turtle, NTriples completely with the RIOT ones, removing thecurrent jena-core parsers.


+ Adds content negotiation for the syntax when reading from URLs

+ FileManager-like functionality when doing model.read.

It's nearly ready to merged into jena-core.

This message is an update, and a request for comments and concerns,particular any specific things we need to ensure compatibility.

It's nearly ready to merged into jena-core then a few last things can bedone (one or two cases can't be done checked with the current RIOTwiring in setup because a few things are hard wired into ModelCom).


An attempt to branch then merge should be possible in the next few weeks.

- - - - - - - -

WebReader is a new class of static methods that does everything throughone algorithm. There are lots of table-driven look ups so adding newlanguages will be possible - the usual suspects are all added as"extensions" to an empty base setup.

Internally, it's driven by content type. Any "file open" generated atyped stream - the type is the content type (file extensions used forfiles). This is different from current Jena where the language ischosen before any attempt to open a connection is made.

1/ All file opening will go via the filemanager includingmodel.read(url) so it covers HTTP, files (and Java resources if we wantto - it's just how the default filemanager is setup).

2/ model.read(url) does content negotiation over HTTP and looks at fileextension for files. And it looks at URl extension when it's text/plainon the basis that dropping files in a directory on a web server meansthat are served text/plain.


3/ RDFReaderF (the factory part) would be removed.

As I discovered, a lot of stuff is hardwired anyway because there isstatic use of RDFReaderFImpl and one model.read operation has fileopening hardwired.


4/ Backwards compatibility for ARP

When asking for an RDF Reader for RDF/XML, a special reader is returnedwhich wraps the current ARP reader so setting properties for a customreader works. But it does fix up file looking things by adding "file:".

This is used only when RDF/XML is explicitly requested. Otherwise,conneg and file ext guessing happens. File extension is basic contenttype choosing for files.

No other languages have any settable reader features. There is auniversal RDFReader for everything so model.getReader(lang) works.


5/ model.read is a compatibility wrapper.

WebReader.read(...) is the key operation, inverting the idea that modelscan have specialised readers - I'm not aware of this being used at alland in fact think it's not possible because some things are built intoModelCom.


ModelCom has:
  private static final RDFReaderF readerFactory = new RDFReaderFImpl();

so only variation by model.getReader(lang) works


What to do about difference of opinion as to the MIME type ....

The model.read(..."lang"...) regard lang as a hint. Into the mix goesthe stream content type, and the hint. File extension sets the contenttype.


But they can disagree so what's the best thing to believe?

If the content type is text/plain, then the hint language is used.

because dropping a file in a directory on a webserver is likely to en upwith it as text/plain. File extension is used for HTTP (!!).

If the content type is not text/plain, at the moment the hint languageis ignored.

I was tempted to say that the hint language overrides anythingdiscovered; this gets files with one extension which are actuallyanother right (use case: ntriples that is really turtle).

But it gets wrong in the httpd case of ask for "foo", hint it is RDF/XMLand get back explicitly application/turtle (the right answer is nowTTL), is wrong.

So to force the type, open a stream to the thing and then pass the(untyped) stream and a hint. It's possible to force a particular readerbut you have to do it a certain way.


And assume people don't use .ttl for RDF/XML files very much.

This may need fine tuning in the light of experience.

        Andy

RIOT / jena-core integration

Reply via email to