[
https://issues.apache.org/jira/browse/COMMONSRDF-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388575#comment-14388575
]
Stian Soiland-Reyes commented on COMMONSRDF-5:
----------------------------------------------
+1 - nobody forces Graph to be implemented either - even
RDFTermFactory.createGraph(*) can throw UnsupportedOperationException
--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718
> Is a graph API the right thing to do?
> -------------------------------------
>
> Key: COMMONSRDF-5
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-5
> Project: Apache Commons RDF
> Issue Type: Wish
> Reporter: Stian Soiland-Reyes
> Priority: Minor
> Labels: client, design, graph, sparql
>
> From https://github.com/commons-rdf/commons-rdf/issues/35
> larsga:
> {quote}
> I have a need for an RDF library in several different projects, and really
> like the idea of a commons-rdf. However, I'm not sure the current proposal
> provides the functionality that is actually necessary.
> How should Java code interact with RDF? The most common case will be via
> SPARQL. So a common SPARQL client library with a convenient API, support for
> persistent connections, SSL, basic auth, etc etc, would be a very valuable
> thing.
> Another case will be to parse (or generate RDF). For this, a simple streaming
> interface would be perfect.
> An API for accessing RDF as an object model I have to say I'm deeply
> skeptical of, for two reasons. The first reason is that it's very rarely a
> good idea. In the vast majority of cases, your data either is in a database
> or should be in database. SPARQL is the right answer in these cases.
> The second reason is that I see many people adoping this API approach to RDF
> even when they obviously should not. The reason seems to be that developers
> want an API, and given an API that's what they choose. Even when,
> architecturally, this is crazy. As a point of comparison, it's very rare for
> people to interact with relation data via interfaces named Database,
> Relation, Row, etc. But for RDF this has somehow become the norm. Some of the
> triple store vendors (like Oracle) even boast of supporting the Jena APIs,
> even though one should under no circumstances use APIs of that kind to work
> with triples stored in Oracle.
> So my fear is that an API like the one currently proposed will not only fail
> to provide the functionality that is most commonly needed, but also lead
> developers astray.
> I guess this is probably not the most pleasant feedback to receive, but I
> felt it had to be said. Sorry about that.
> {quote}
> ansell:
> {quote}
> No need to apologise (@wikier and I asked you to expand on your Twitter
> comments!)
> From my perspective, I would love to port (and improve where necessary)
> RDFHandler from Sesame to Commons RDF. However, we felt that it was not
> applicable in the first version that we requested comments from people, based
> on a very narrow scope of solely relying on the RDF-1.1 Abstract Model
> terminology.
> As you point out, the level of terminology used in the Abstract Model is too
> low for common application usage. @afs has pointed out difficulties with
> using Graph as the actual access layer for a database. In Sesame, the
> equivalent Graph interface is never used for access to permanent data stores,
> only for in-memory filtering and analysis between the database and users,
> which happens fairly often in my applications so I am glad that it exists.
> A good fast portable SPARQL client library would still need an object model
> to represent the results in, to send them to a typesafe API. Before we do
> that we wanted to get the object model to a relatively mature stage.
> From this point on we have a few paths that we can follow to expand out to an
> RDF Streaming API and a SPARQL client library, particularly as we have a
> focus on Java-8 with Lambdas.
> For example, we could have something like:
> {code}
> interface TupleHandler {
> void handleHeaders(List<String> headers);
> void handleTuple(List<RDFTerm> row);
> }
> interface RDFHandler {
> void handleRDF(Triple triple)
> }
> interface SPARQLClient {
> void tupleHandler(TupleHandler handler);
> void rdfHandler(RDFHandler handler);
> boolean ask(String query);
> void select(String query);
> void construct(String query);
> void describe(IRI iri);
> void describe(String query);
> }
> {code}
> Usage may be:
> {code}
> client.ask(myQuery)
> client.tupleHandler(handler).select(myQuery)
> client.rdfHandler(handler).construct(myQuery)
> client.rdfHandler(handler).describe(IRI)
> client.rdfHandler(handler).describe(myQuery)
> {code}
> Could you suggest a few possible alternative models that would suit you and
> critique that model?
> {quote}
> afs:
> {quote}
> Commons-RDF allows an application to switch bewteen implmentations. A
> variation of @larsga's point is that SPARQL (languages and protocols) gives
> that separation already.
> I would like to hear more as to what is special about a portable SPARQL
> Client because it seems to be a purely local choice for the application. You
> can issue SPARQL queries over JDBC (see jena-jdbc). People are already
> querying DBpedia etc from Jena or Sesame or javascript or python or ... .
> DBpedia does not impose client choice.
> There are processing models that are not so SPARQL-amenable, such as graph
> some analytics (think map/reduce or RDD), where handling the data at the RDF
> 1.1 data model is important and then the RDF graph does matter as a concept
> because the application wshed to walk the graph, following links.
> What would make working with SPARQL easier, but does not need portablity,
> needed is mini-languages that make SPARQL easier to write in programs, maybe
> specialised to particular usage patterns. There is no need for mega-toolkits
> everywhere.
> @larsga - what's in your ideal RDF library?
> (To Oracle, and others, "Jena API", includes the SPARQL interface then how to
> deal with the results.)
> {quote}
> larsga:
> {quote}
> What's special about a common SPARQL client is that none seems to exist in
> Java at the moment. So if commons-rdf could provide one that would be great.
> Getting results via JDBC may be preferable in some cases, but in general it's
> not ideal. How do you get the result as it really was in that case? With data
> type URIs and language tags? How do you get the parsed results of CONSTRUCT
> queries? In addition, the API is not very convenient.
> jena-jdbc requires jena-jdbc-core, which in turn requires ARQ, which then
> requires ... That's a non-starter. If I simply want to send SPARQL queries
> over HTTP having to pull in the entire Jena stack is just not on.
> > There are processing models that are not so SPARQL-amenable, such as graph
> > some analytics (think map/reduce or RDD), where handling the data at the
> > RDF 1.1 data model is important and then the RDF graph does matter as a
> > concept because the application wshed to walk the graph, following links.
> Yes. This is a corner case, though, and it's very far from obvious that a
> full-blown object model for graph traversal is the best way to approach this.
> Or that it will even scale. But never mind that.
> What's missing in the Java/RDF space is the main tools you really need to
> build an RDF application in Java: streaming API to parsers plus a SPARQL
> client. Something like this can be provided very easily in a very
> light-weight package, and would provide immense value.
> An object model representing the RDF Data Model directly would, imho, do more
> harm than good, simply because it would mislead people into thinking that
> this is the right way to interact with RDF in general.
> {quote}
> afs:
> {quote}
> At the minimum, you don't need anything other than an HTTP client and
> retrieve JSON!
> If you want to work in RDF concepts in your application, Jena provides
> streaming parsers plus a SPARQL client as does Sesame. Each provides exactly
> what you describe! Yes, a system that was minimal would be smaller but (1) is
> the size difference that critical (solution - strip down a toolkit); data is
> large, code is small and (2) in what way is it not yet-another-toolkit, and
> all that goes with that?
> {quote}
> wikier:
> {quote}
> OK, I think now I understand @larsga's point...
> I do agree that SPARQL should be in theory such "common interface". But what
> happens right now it that each library serializes the results using their own
> terms. So one of the goals of commons-rdf would be to align the interfaces
> there too.
> Of course you could always say you can be decoupled by parsing the results by
> yourself. But that has two problems: On the one hand, you are reimplementing
> code you do not need and probably making mistakes. On the other hand, that
> only works if your code is not going to be used by anyone else; as soon as
> it's going to be used, instead of solving a problem your are causing another
> one.
> In case this helps for the discussion, we discussed the idea of commons-rdf
> because in two following weeks I had to deal with the same problem: I needed
> to provide a simple client library and I realized the decision I made in
> terms of which library I chose forced people to use that library.
> Those two client libraries are the Redlink SDK and MICO. Both with different
> purposes and different targets, but in the end dealing with the same problem.
> https://github.com/redlink-gmbh/redlink-java-sdk
> http://www.mico-project.eu/commons-rdf/
> {quote}
> larsga:
> {quote}
> Yes, this is getting closer to what I meant. As you say, a SPARQL client
> library is fine for stuff like ASK, SELECT, INSERT and so on. The problem is
> CONSTRUCT, or if you want to parse a file. However, even in those cases I do
> not want an in-memory representation of the resulting RDF. I want it
> streamed, kind of like SAX for XML. Then, if I need an in-memory
> representation I will build one from the resulting stream.
> Now if you argue that there will be people for whom an in-memory
> representation is the best choice I guess that's OK. But I think it's wrong
> to force people to go via such a representation. Ideally, I'd like to see:
> * a simple streaming interface,
> * a simple abstraction for parsers and writers,
> * a SPARQL client library that represents RDF as callback streams.
> If there also has to be a full-blown API with Graph, Statement, and the like,
> so be it. But it would IMHO be best if that were layered on top of the rest
> as an option, so that if you wanted you could build such a model by streaming
> statements into it, but you wouldn't be forced to go via those interfaces if
> you didn't want to.
> {quote}
> ansell:
> {quote}
> I understand that Graph is an abstraction that many people do not need,
> particularly if they are streaming, but Statement seems to be a very useful
> abstraction in an object oriented language, and it should be very low cost to
> create, even if you are streaming.
> As Andy says, both Sesame and Jena currently offer streaming parsers for both
> SPARQL Results formats and RDF formats, so your main argument right now seems
> to be possible in practice already. The choice is just not interchangable
> after you decide which library to use at this point, which is the reason that
> we stopped where we did so far as the current model is at least enough to get
> streaming parsers going.
> All parts of the API are loosely linked at this point, with a clear
> theoretical model from RDF-1.1. Hence, you don't need to implement or use
> Graph if you just want a streaming API that accepts Statement or a
> combination of the available RDFTerm's.
> {quote}
> drlivingstone:
> {quote}
> think Statement seems like something that would be essential / useful - it's
> the smallest "functional" piece of RDF. (A use case where you want to iterate
> over parts of a Graph response that are in units smaller than triples seems
> weird to me - why not use a Select query then?, but anyway.) Whether Graph
> gets its own Class/API, or whether Statement could be a (potentially
> implicit) quad instead is probably where the different underlying libraries
> will have differing goals.
> Regarding the goals of the library to have common abstractions / vocabulary -
> I would bet most people using RDF are also using (at least some) SPARQL. You
> can build a generic interface for querying and streaming through results that
> covers both Jena and Sesame, I have done so in Clojure anyway, in my KR
> library. This requires more than just agreeing that results are in terms of
> the common RDFTerm class though as pointed out above, a common SPARQL API is
> needed to agree to how tuples or graphs etc. are returned/iterated over etc.
> But it wasn't that hard to do. Having the underling library maintainers do it
> for me (possibly more efficiently) would have certainly been better. This
> goes beyond the scope of just defining core RDF terms though.
> {quote}
> stain:
> {quote}
> I think the Graph concept is useful - not everyone is accessing pre-existing
> data on a pre-existing SPARQL server. For instance, a light-weight container
> for annotations might want to expose just a couple of Graph instances without
> exposing the underlying RDF framework. Someone who is generating RDF as a
> side-product can chuck their triples in a Graph and then pass it to arbitrary
> RDF framework for serialization or going to a LOD server.
> I can see many libraries that would not use Graph, but could use the other
> RDFTerms.
> This would be the case for OWLAPI for instance, which has Ontology as a core
> concept rather than a graph. Operations like Graph.add() don't make much
> sense in general there, as you have to serialize the ontology as RDF before
> you get a graph.
> I don't think it should be a requirement for implementors to provide a Graph
> implementation - thus RDFTermFactory.createGraph() is optional.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)