On 18/10/12 22:21, Rob Vesse wrote:
Hey Andy
Sorry for taking forever to get back to you on this but comments inline:
On 8/17/12 5:54 AM, "Andy Seaborne" <[email protected]> wrote:
I'm at the point of being ready to integrate RIOT and anew reader system
into Jena properly. This means we can remove the old parsers in
jena-core (not ARP).
There is a "but" however.
RIOT supports both triples and quads readers and model/graphs and
datasets/datasetgraphs ... but classes for all things quad are in ARQ.
I've created a JIRA but I thought I'd surface it here because it has the
potential to be disruptive.
https://issues.apache.org/jira/browse/JENA-300
== Integration
Possibilities:
1/ Put the code in ARQ
1a/ require a cal lto ARQ to initialize
1b/ make jena-core do as reflection call to ARQ initialization.
2/ Merge jena-arq and jena-core
The obvious issue for (2) is that the result is a big project to work
with. Whether a larger jena-core really makes a difference in the real
world., I don't know. Long term, some redivision into separate modules
would be good but it's quite hard to find any breakdown of core concepts
if you want testing by module. It's hard to do anything much without a
memory graph implementation!
If (2), it would be good to time this with making an uber jar
"jena-VERSION.jar" so that people switch to that and don't see any
future reorg of the modules unless they take a detailed look.
How about this as a suggestion for the short term:
- Move Quad and the riot sub-system into jena-core
- Replace the jena-core reader machinery with the riot sub-system
This has the advantage of keeping everything query still in it's own
module and does not need to break down core. Ideally it would be nice to
split off the riot sub-system into it's own module but then you get into
problems of there being no reader/writer sub-system in core and requiring
users to pull in an extra dependency for one of the most common things
they are going to do. I assume you plan to integrate this after 2.7.4
perhaps with a minor version bump I.e. 2.8.0
Longer term I tried to think of some ways to nicely separate things out
but was kinda struggling, with the Model interface as it stands (wit it's
own read()/write() methods) there is no way to cleanly separate the riot
sub-system out from jena-core/jena-arq in the same way that Sesame
separate their IO subsystem into their RIO modules. They have a
sesame-rio-api module and then specific small modules implementing each
reader.
If we could remove the read()/write() methods from Model then we can start
to get a better separation of concerns:
- Interfaces for reading/writing form a jena-riot-api module
- Implementations form another module jena-riot-std module
In the place of a read()/write() method directly on a Model we can provide
a static ModelIO class with read() and write() methods. Wiring up of
readers and writers for use by this could perhaps be done automagically
through some package scanning and Java attributes combination?
Hope these thoughts help
Rob
Moving just RIOT out of ARQ is the way to go. It's not just Quad - it's
Dataset as well which is public API. Quad informally is as well so it
needs to be coupled with a significant version change. While not in the
API, extensions and deep working with ARQ does tend to arrive at Quad.
After that, its the effect of pulling the thread that yanks more stuff.
The jena-*-api module idea would work although maybe some testing
might need to be put into a testing module to get ordering right. Hard
to test APIs without an implementation to hand.
RIOT would be its own subsystem - it does not need all of jena-core (it
should not need the client API for example, or OntAPI, it does need
datatypes).
I'm not convinced about one module per parser because they (this is "not
RDF/XML") share so much but one module for RIOT would be ideal. I
confess I don't like it when the internal need for a module structure
ends up dictaing the public API design - sometimes a public API with a
mix of things is easier to use but the mix may be across internal design.
All .read calls do become legacy ways of getting to a library - it is
inverting the structure. What it bites is WebReader2 - the class of
static functions that reads things. It has both Model and Dataset calls.
Having ModelIO for all Model calls and DatasetIO for all datasets calls
is a good thought. I'll give that a go and so if the dependencies work out.
But it is a nice example of where internal divisions force public API.
What if you read a web location, not knowing if its triples or quads?
Be nicer to get the right thing back not have to decide before the call
whether it's triples or whether if quads.
All .read calls do become legacy ways of getting to the new reader
structure. riot-reader rewires existing Jena to route the .read calls
to one piece of code. It was one single reader but two places require
the language to be known in advance so it is one very thin reader per
language to add in the default value. The existing code does
newInstance after deciding the language. The new code delays the
language decision until after conneg.
If we have a single "one jar", then trick of jena-core making a
reflection call on RIOT to initialize with RIOT reader will work. The
user will see new code without us beforehand having to undergo a deep
restructure but eventual the restructure should happen, on a timescale
that is relaxed, not forced by release cycles.
Andy
== Outline of the reader
There is a single class "WebReader2" that captures the process of
opening a connection to a resource/file/thing, deciding the syntax and
then calling the right parser. This adds full http content negotiation
over what Jena currently does.
You can add new content-types and connect to the appropriate parser code.
It includes going through FileManager and if/when that connected to
model.read, all the conneg, redirection and location mapping is made
fundamental. You can even could make all URLs of a pattern
http://myhost/data/turtle/file{n}
be Turtle files despite being served as text/plain.
== Code
In an "Experimental" project:
https://svn.apache.org/repos/asf/jena/Experimental/riot-reader/
Code browse;
https://svn.apache.org/viewvc/jena/Experimental/riot-reader/src/main/java/
riot_reader/
The package layout isn't right for integration.