I've been working (in my ASF scratch area) on a new I/O subsystem to
replace the current one in Jena.
+ Replaces Turtle, NTriples completely with the RIOT ones, removing the
current jena-core parsers.
+ Adds content negotiation for the syntax when reading from URLs
+ FileManager-like functionality when doing model.read.
It's nearly ready to merged into jena-core.
This message is an update, and a request for comments and concerns,
particular any specific things we need to ensure compatibility.
It's nearly ready to merged into jena-core then a few last things can be
done (one or two cases can't be done checked with the current RIOT
wiring in setup because a few things are hard wired into ModelCom).
An attempt to branch then merge should be possible in the next few weeks.
- - - - - - - -
WebReader is a new class of static methods that does everything through
one algorithm. There are lots of table-driven look ups so adding new
languages will be possible - the usual suspects are all added as
"extensions" to an empty base setup.
Internally, it's driven by content type. Any "file open" generated a
typed stream - the type is the content type (file extensions used for
files). This is different from current Jena where the language is
chosen before any attempt to open a connection is made.
1/ All file opening will go via the filemanager including
model.read(url) so it covers HTTP, files (and Java resources if we want
to - it's just how the default filemanager is setup).
2/ model.read(url) does content negotiation over HTTP and looks at file
extension for files. And it looks at URl extension when it's text/plain
on the basis that dropping files in a directory on a web server means
that are served text/plain.
3/ RDFReaderF (the factory part) would be removed.
As I discovered, a lot of stuff is hardwired anyway because there is
static use of RDFReaderFImpl and one model.read operation has file
opening hardwired.
4/ Backwards compatibility for ARP
When asking for an RDF Reader for RDF/XML, a special reader is returned
which wraps the current ARP reader so setting properties for a custom
reader works. But it does fix up file looking things by adding "file:".
This is used only when RDF/XML is explicitly requested. Otherwise,
conneg and file ext guessing happens. File extension is basic content
type choosing for files.
No other languages have any settable reader features. There is a
universal RDFReader for everything so model.getReader(lang) works.
5/ model.read is a compatibility wrapper.
WebReader.read(...) is the key operation, inverting the idea that models
can have specialised readers - I'm not aware of this being used at all
and in fact think it's not possible because some things are built into
ModelCom.
ModelCom has:
private static final RDFReaderF readerFactory = new RDFReaderFImpl();
so only variation by model.getReader(lang) works
What to do about difference of opinion as to the MIME type ....
The model.read(..."lang"...) regard lang as a hint. Into the mix goes
the stream content type, and the hint. File extension sets the content
type.
But they can disagree so what's the best thing to believe?
If the content type is text/plain, then the hint language is used.
because dropping a file in a directory on a webserver is likely to en up
with it as text/plain. File extension is used for HTTP (!!).
If the content type is not text/plain, at the moment the hint language
is ignored.
I was tempted to say that the hint language overrides anything
discovered; this gets files with one extension which are actually
another right (use case: ntriples that is really turtle).
But it gets wrong in the httpd case of ask for "foo", hint it is RDF/XML
and get back explicitly application/turtle (the right answer is now
TTL), is wrong.
So to force the type, open a stream to the thing and then pass the
(untyped) stream and a hint. It's possible to force a particular reader
but you have to do it a certain way.
And assume people don't use .ttl for RDF/XML files very much.
This may need fine tuning in the light of experience.
Andy