I've been working (in my ASF scratch area) on a new I/O subsystem to replace the current one in Jena.

+ Replaces Turtle, NTriples completely with the RIOT ones, removing the current jena-core parsers.

+ Adds content negotiation for the syntax when reading from URLs

+ FileManager-like functionality when doing model.read.

It's nearly ready to merged into jena-core.

This message is an update, and a request for comments and concerns, particular any specific things we need to ensure compatibility.

It's nearly ready to merged into jena-core then a few last things can be done (one or two cases can't be done checked with the current RIOT wiring in setup because a few things are hard wired into ModelCom).

An attempt to branch then merge should be possible in the next few weeks.

- - - - - - - -

WebReader is a new class of static methods that does everything through one algorithm. There are lots of table-driven look ups so adding new languages will be possible - the usual suspects are all added as "extensions" to an empty base setup.

Internally, it's driven by content type. Any "file open" generated a typed stream - the type is the content type (file extensions used for files). This is different from current Jena where the language is chosen before any attempt to open a connection is made.

1/ All file opening will go via the filemanager including model.read(url) so it covers HTTP, files (and Java resources if we want to - it's just how the default filemanager is setup).

2/ model.read(url) does content negotiation over HTTP and looks at file extension for files. And it looks at URl extension when it's text/plain on the basis that dropping files in a directory on a web server means that are served text/plain.

3/ RDFReaderF (the factory part) would be removed.

As I discovered, a lot of stuff is hardwired anyway because there is static use of RDFReaderFImpl and one model.read operation has file opening hardwired.

4/ Backwards compatibility for ARP

When asking for an RDF Reader for RDF/XML, a special reader is returned which wraps the current ARP reader so setting properties for a custom reader works. But it does fix up file looking things by adding "file:".

This is used only when RDF/XML is explicitly requested. Otherwise, conneg and file ext guessing happens. File extension is basic content type choosing for files.

No other languages have any settable reader features. There is a universal RDFReader for everything so model.getReader(lang) works.

5/ model.read is a compatibility wrapper.

WebReader.read(...) is the key operation, inverting the idea that models can have specialised readers - I'm not aware of this being used at all and in fact think it's not possible because some things are built into ModelCom.

ModelCom has:
  private static final RDFReaderF readerFactory = new RDFReaderFImpl();

so only variation by model.getReader(lang) works


What to do about difference of opinion as to the MIME type ....

The model.read(..."lang"...) regard lang as a hint. Into the mix goes the stream content type, and the hint. File extension sets the content type.

But they can disagree so what's the best thing to believe?

If the content type is text/plain, then the hint language is used.

because dropping a file in a directory on a webserver is likely to en up with it as text/plain. File extension is used for HTTP (!!).

If the content type is not text/plain, at the moment the hint language is ignored.

I was tempted to say that the hint language overrides anything discovered; this gets files with one extension which are actually another right (use case: ntriples that is really turtle).

But it gets wrong in the httpd case of ask for "foo", hint it is RDF/XML and get back explicitly application/turtle (the right answer is now TTL), is wrong.

So to force the type, open a stream to the thing and then pass the (untyped) stream and a hint. It's possible to force a particular reader but you have to do it a certain way.

And assume people don't use .ttl for RDF/XML files very much.

This may need fine tuning in the light of experience.

        Andy

Reply via email to