Stian Soiland-Reyes created COMMONSRDF-5:
--------------------------------------------
Summary: Is a graph API the right thing to do?
Key: COMMONSRDF-5
URL: https://issues.apache.org/jira/browse/COMMONSRDF-5
Project: Apache Commons RDF
Issue Type: Wish
Reporter: Stian Soiland-Reyes
Priority: Minor
>From https://github.com/commons-rdf/commons-rdf/issues/35
larsga:
{quote}
I have a need for an RDF library in several different projects, and really like
the idea of a commons-rdf. However, I'm not sure the current proposal provides
the functionality that is actually necessary.
How should Java code interact with RDF? The most common case will be via
SPARQL. So a common SPARQL client library with a convenient API, support for
persistent connections, SSL, basic auth, etc etc, would be a very valuable
thing.
Another case will be to parse (or generate RDF). For this, a simple streaming
interface would be perfect.
An API for accessing RDF as an object model I have to say I'm deeply skeptical
of, for two reasons. The first reason is that it's very rarely a good idea. In
the vast majority of cases, your data either is in a database or should be in
database. SPARQL is the right answer in these cases.
The second reason is that I see many people adoping this API approach to RDF
even when they obviously should not. The reason seems to be that developers
want an API, and given an API that's what they choose. Even when,
architecturally, this is crazy. As a point of comparison, it's very rare for
people to interact with relation data via interfaces named Database, Relation,
Row, etc. But for RDF this has somehow become the norm. Some of the triple
store vendors (like Oracle) even boast of supporting the Jena APIs, even though
one should under no circumstances use APIs of that kind to work with triples
stored in Oracle.
So my fear is that an API like the one currently proposed will not only fail to
provide the functionality that is most commonly needed, but also lead
developers astray.
I guess this is probably not the most pleasant feedback to receive, but I felt
it had to be said. Sorry about that.
{quote}
ansell:
{quote}
No need to apologise (@wikier and I asked you to expand on your Twitter
comments!)
>From my perspective, I would love to port (and improve where necessary)
>RDFHandler from Sesame to Commons RDF. However, we felt that it was not
>applicable in the first version that we requested comments from people, based
>on a very narrow scope of solely relying on the RDF-1.1 Abstract Model
>terminology.
As you point out, the level of terminology used in the Abstract Model is too
low for common application usage. @afs has pointed out difficulties with using
Graph as the actual access layer for a database. In Sesame, the equivalent
Graph interface is never used for access to permanent data stores, only for
in-memory filtering and analysis between the database and users, which happens
fairly often in my applications so I am glad that it exists.
A good fast portable SPARQL client library would still need an object model to
represent the results in, to send them to a typesafe API. Before we do that we
wanted to get the object model to a relatively mature stage.
>From this point on we have a few paths that we can follow to expand out to an
>RDF Streaming API and a SPARQL client library, particularly as we have a focus
>on Java-8 with Lambdas.
For example, we could have something like:
{code}
interface TupleHandler {
void handleHeaders(List<String> headers);
void handleTuple(List<RDFTerm> row);
}
interface RDFHandler {
void handleRDF(Triple triple)
}
interface SPARQLClient {
void tupleHandler(TupleHandler handler);
void rdfHandler(RDFHandler handler);
boolean ask(String query);
void select(String query);
void construct(String query);
void describe(IRI iri);
void describe(String query);
}
{code}
Usage may be:
{code}
client.ask(myQuery)
client.tupleHandler(handler).select(myQuery)
client.rdfHandler(handler).construct(myQuery)
client.rdfHandler(handler).describe(IRI)
client.rdfHandler(handler).describe(myQuery)
{code}
Could you suggest a few possible alternative models that would suit you and
critique that model?
{quote}
afs:
{quote}
Commons-RDF allows an application to switch bewteen implmentations. A variation
of @larsga's point is that SPARQL (languages and protocols) gives that
separation already.
I would like to hear more as to what is special about a portable SPARQL Client
because it seems to be a purely local choice for the application. You can issue
SPARQL queries over JDBC (see jena-jdbc). People are already querying DBpedia
etc from Jena or Sesame or javascript or python or ... . DBpedia does not
impose client choice.
There are processing models that are not so SPARQL-amenable, such as graph some
analytics (think map/reduce or RDD), where handling the data at the RDF 1.1
data model is important and then the RDF graph does matter as a concept because
the application wshed to walk the graph, following links.
What would make working with SPARQL easier, but does not need portablity,
needed is mini-languages that make SPARQL easier to write in programs, maybe
specialised to particular usage patterns. There is no need for mega-toolkits
everywhere.
@larsga - what's in your ideal RDF library?
(To Oracle, and others, "Jena API", includes the SPARQL interface then how to
deal with the results.)
{quote}
larsga:
{quote}
What's special about a common SPARQL client is that none seems to exist in Java
at the moment. So if commons-rdf could provide one that would be great.
Getting results via JDBC may be preferable in some cases, but in general it's
not ideal. How do you get the result as it really was in that case? With data
type URIs and language tags? How do you get the parsed results of CONSTRUCT
queries? In addition, the API is not very convenient.
jena-jdbc requires jena-jdbc-core, which in turn requires ARQ, which then
requires ... That's a non-starter. If I simply want to send SPARQL queries over
HTTP having to pull in the entire Jena stack is just not on.
> There are processing models that are not so SPARQL-amenable, such as graph
> some analytics (think map/reduce or RDD), where handling the data at the RDF
> 1.1 data model is important and then the RDF graph does matter as a concept
> because the application wshed to walk the graph, following links.
Yes. This is a corner case, though, and it's very far from obvious that a
full-blown object model for graph traversal is the best way to approach this.
Or that it will even scale. But never mind that.
What's missing in the Java/RDF space is the main tools you really need to build
an RDF application in Java: streaming API to parsers plus a SPARQL client.
Something like this can be provided very easily in a very light-weight package,
and would provide immense value.
An object model representing the RDF Data Model directly would, imho, do more
harm than good, simply because it would mislead people into thinking that this
is the right way to interact with RDF in general.
{quote}
afs:
{quote}
At the minimum, you don't need anything other than an HTTP client and retrieve
JSON!
If you want to work in RDF concepts in your application, Jena provides
streaming parsers plus a SPARQL client as does Sesame. Each provides exactly
what you describe! Yes, a system that was minimal would be smaller but (1) is
the size difference that critical (solution - strip down a toolkit); data is
large, code is small and (2) in what way is it not yet-another-toolkit, and all
that goes with that?
{quote}
wikier:
{quote}
OK, I think now I understand @larsga's point...
I do agree that SPARQL should be in theory such "common interface". But what
happens right now it that each library serializes the results using their own
terms. So one of the goals of commons-rdf would be to align the interfaces
there too.
Of course you could always say you can be decoupled by parsing the results by
yourself. But that has two problems: On the one hand, you are reimplementing
code you do not need and probably making mistakes. On the other hand, that only
works if your code is not going to be used by anyone else; as soon as it's
going to be used, instead of solving a problem your are causing another one.
In case this helps for the discussion, we discussed the idea of commons-rdf
because in two following weeks I had to deal with the same problem: I needed to
provide a simple client library and I realized the decision I made in terms of
which library I chose forced people to use that library.
Those two client libraries are the Redlink SDK and MICO. Both with different
purposes and different targets, but in the end dealing with the same problem.
https://github.com/redlink-gmbh/redlink-java-sdk
http://www.mico-project.eu/commons-rdf/
{quote}
larsga:
{quote}
Yes, this is getting closer to what I meant. As you say, a SPARQL client
library is fine for stuff like ASK, SELECT, INSERT and so on. The problem is
CONSTRUCT, or if you want to parse a file. However, even in those cases I do
not want an in-memory representation of the resulting RDF. I want it streamed,
kind of like SAX for XML. Then, if I need an in-memory representation I will
build one from the resulting stream.
Now if you argue that there will be people for whom an in-memory representation
is the best choice I guess that's OK. But I think it's wrong to force people to
go via such a representation. Ideally, I'd like to see:
* a simple streaming interface,
* a simple abstraction for parsers and writers,
* a SPARQL client library that represents RDF as callback streams.
If there also has to be a full-blown API with Graph, Statement, and the like,
so be it. But it would IMHO be best if that were layered on top of the rest as
an option, so that if you wanted you could build such a model by streaming
statements into it, but you wouldn't be forced to go via those interfaces if
you didn't want to.
{quote}
ansell:
{quote}
I understand that Graph is an abstraction that many people do not need,
particularly if they are streaming, but Statement seems to be a very useful
abstraction in an object oriented language, and it should be very low cost to
create, even if you are streaming.
As Andy says, both Sesame and Jena currently offer streaming parsers for both
SPARQL Results formats and RDF formats, so your main argument right now seems
to be possible in practice already. The choice is just not interchangable after
you decide which library to use at this point, which is the reason that we
stopped where we did so far as the current model is at least enough to get
streaming parsers going.
All parts of the API are loosely linked at this point, with a clear theoretical
model from RDF-1.1. Hence, you don't need to implement or use Graph if you just
want a streaming API that accepts Statement or a combination of the available
RDFTerm's.
{quote}
drlivingstone:
{quote}
think Statement seems like something that would be essential / useful - it's
the smallest "functional" piece of RDF. (A use case where you want to iterate
over parts of a Graph response that are in units smaller than triples seems
weird to me - why not use a Select query then?, but anyway.) Whether Graph gets
its own Class/API, or whether Statement could be a (potentially implicit) quad
instead is probably where the different underlying libraries will have
differing goals.
Regarding the goals of the library to have common abstractions / vocabulary - I
would bet most people using RDF are also using (at least some) SPARQL. You can
build a generic interface for querying and streaming through results that
covers both Jena and Sesame, I have done so in Clojure anyway, in my KR
library. This requires more than just agreeing that results are in terms of the
common RDFTerm class though as pointed out above, a common SPARQL API is needed
to agree to how tuples or graphs etc. are returned/iterated over etc. But it
wasn't that hard to do. Having the underling library maintainers do it for me
(possibly more efficiently) would have certainly been better. This goes beyond
the scope of just defining core RDF terms though.
{quote}
stain:
{quote}
I think the Graph concept is useful - not everyone is accessing pre-existing
data on a pre-existing SPARQL server. For instance, a light-weight container
for annotations might want to expose just a couple of Graph instances without
exposing the underlying RDF framework. Someone who is generating RDF as a
side-product can chuck their triples in a Graph and then pass it to arbitrary
RDF framework for serialization or going to a LOD server.
I can see many libraries that would not use Graph, but could use the other
RDFTerms.
This would be the case for OWLAPI for instance, which has Ontology as a core
concept rather than a graph. Operations like Graph.add() don't make much sense
in general there, as you have to serialize the ontology as RDF before you get a
graph.
I don't think it should be a requirement for implementors to provide a Graph
implementation - thus RDFTermFactory.createGraph() is optional.
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)