[jira] [Resolved] (COMMONSRDF-5) Is a graph API the right thing to do?

Andy Seaborne (JIRA) Fri, 17 Apr 2015 08:34:05 -0700

     [ 
https://issues.apache.org/jira/browse/COMMONSRDF-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andy Seaborne resolved COMMONSRDF-5.
------------------------------------
       Resolution: Done
    Fix Version/s: 0.1
         Assignee: Andy Seaborne

This is an interesting discussion point and we can continue on the mailing 
list.  As a JIRA item though, there seems to be agreement that this is the 
initial objective of commonsrdf.  Resolved as "done".

> Is a graph API the right thing to do?
> -------------------------------------
>
>                 Key: COMMONSRDF-5
>                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-5
>             Project: Apache Commons RDF
>          Issue Type: Wish
>            Reporter: Stian Soiland-Reyes (old)
>            Assignee: Andy Seaborne
>            Priority: Minor
>              Labels: client, design, graph, sparql
>             Fix For: 0.1
>
>
> From https://github.com/commons-rdf/commons-rdf/issues/35
> larsga:
> {quote}
> I have a need for an RDF library in several different projects, and really 
> like the idea of a commons-rdf. However, I'm not sure the current proposal 
> provides the functionality that is actually necessary.
> How should Java code interact with RDF? The most common case will be via 
> SPARQL. So a common SPARQL client library with a convenient API, support for 
> persistent connections, SSL, basic auth, etc etc, would be a very valuable 
> thing.
> Another case will be to parse (or generate RDF). For this, a simple streaming 
> interface would be perfect.
> An API for accessing RDF as an object model I have to say I'm deeply 
> skeptical of, for two reasons. The first reason is that it's very rarely a 
> good idea. In the vast majority of cases, your data either is in a database 
> or should be in database. SPARQL is the right answer in these cases.
> The second reason is that I see many people adoping this API approach to RDF 
> even when they obviously should not. The reason seems to be that developers 
> want an API, and given an API that's what they choose. Even when, 
> architecturally, this is crazy. As a point of comparison, it's very rare for 
> people to interact with relation data via interfaces named Database, 
> Relation, Row, etc. But for RDF this has somehow become the norm. Some of the 
> triple store vendors (like Oracle) even boast of supporting the Jena APIs, 
> even though one should under no circumstances use APIs of that kind to work 
> with triples stored in Oracle.
> So my fear is that an API like the one currently proposed will not only fail 
> to provide the functionality that is most commonly needed, but also lead 
> developers astray.
> I guess this is probably not the most pleasant feedback to receive, but I 
> felt it had to be said. Sorry about that.
> {quote}
> ansell:
> {quote}
> No need to apologise (@wikier and I asked you to expand on your Twitter 
> comments!)
> From my perspective, I would love to port (and improve where necessary) 
> RDFHandler from Sesame to Commons RDF. However, we felt that it was not 
> applicable in the first version that we requested comments from people, based 
> on a very narrow scope of solely relying on the RDF-1.1 Abstract Model 
> terminology.
> As you point out, the level of terminology used in the Abstract Model is too 
> low for common application usage. @afs has pointed out difficulties with 
> using Graph as the actual access layer for a database. In Sesame, the 
> equivalent Graph interface is never used for access to permanent data stores, 
> only for in-memory filtering and analysis between the database and users, 
> which happens fairly often in my applications so I am glad that it exists.
> A good fast portable SPARQL client library would still need an object model 
> to represent the results in, to send them to a typesafe API. Before we do 
> that we wanted to get the object model to a relatively mature stage.
> From this point on we have a few paths that we can follow to expand out to an 
> RDF Streaming API and a SPARQL client library, particularly as we have a 
> focus on Java-8 with Lambdas.
> For example, we could have something like:
> {code}
> interface TupleHandler {
>   void handleHeaders(List<String> headers);
>   void handleTuple(List<RDFTerm> row);
> }
> interface RDFHandler {
>   void handleRDF(Triple triple)
> }
> interface SPARQLClient {
>   void tupleHandler(TupleHandler handler);
>   void rdfHandler(RDFHandler handler);
>   boolean ask(String query);
>   void select(String query);
>   void construct(String query);
>   void describe(IRI iri);
>   void describe(String query);
> }
> {code}
> Usage may be:
> {code}
> client.ask(myQuery)
> client.tupleHandler(handler).select(myQuery)
> client.rdfHandler(handler).construct(myQuery)
> client.rdfHandler(handler).describe(IRI)
> client.rdfHandler(handler).describe(myQuery)
> {code}
> Could you suggest a few possible alternative models that would suit you and 
> critique that model?
> {quote}
> afs:
> {quote}
> Commons-RDF allows an application to switch bewteen implmentations. A 
> variation of @larsga's point is that SPARQL (languages and protocols) gives 
> that separation already.
> I would like to hear more as to what is special about a portable SPARQL 
> Client because it seems to be a purely local choice for the application. You 
> can issue SPARQL queries over JDBC (see jena-jdbc). People are already 
> querying DBpedia etc from Jena or Sesame or javascript or python or ... . 
> DBpedia does not impose client choice.
> There are processing models that are not so SPARQL-amenable, such as graph 
> some analytics (think map/reduce or RDD), where handling the data at the RDF 
> 1.1 data model is important and then the RDF graph does matter as a concept 
> because the application wshed to walk the graph, following links.
> What would make working with SPARQL easier, but does not need portablity, 
> needed is mini-languages that make SPARQL easier to write in programs, maybe 
> specialised to particular usage patterns. There is no need for mega-toolkits 
> everywhere.
> @larsga - what's in your ideal RDF library?
> (To Oracle, and others, "Jena API", includes the SPARQL interface then how to 
> deal with the results.)
> {quote}
> larsga:
> {quote}
> What's special about a common SPARQL client is that none seems to exist in 
> Java at the moment. So if commons-rdf could provide one that would be great.
> Getting results via JDBC may be preferable in some cases, but in general it's 
> not ideal. How do you get the result as it really was in that case? With data 
> type URIs and language tags? How do you get the parsed results of CONSTRUCT 
> queries? In addition, the API is not very convenient.
> jena-jdbc requires jena-jdbc-core, which in turn requires ARQ, which then 
> requires ... That's a non-starter. If I simply want to send SPARQL queries 
> over HTTP having to pull in the entire Jena stack is just not on.
> > There are processing models that are not so SPARQL-amenable, such as graph 
> > some analytics (think map/reduce or RDD), where handling the data at the 
> > RDF 1.1 data model is important and then the RDF graph does matter as a 
> > concept because the application wshed to walk the graph, following links.
> Yes. This is a corner case, though, and it's very far from obvious that a 
> full-blown object model for graph traversal is the best way to approach this. 
> Or that it will even scale. But never mind that.
> What's missing in the Java/RDF space is the main tools you really need to 
> build an RDF application in Java: streaming API to parsers plus a SPARQL 
> client. Something like this can be provided very easily in a very 
> light-weight package, and would provide immense value.
> An object model representing the RDF Data Model directly would, imho, do more 
> harm than good, simply because it would mislead people into thinking that 
> this is the right way to interact with RDF in general.
> {quote}
> afs:
> {quote}
> At the minimum, you don't need anything other than an HTTP client and 
> retrieve JSON!
> If you want to work in RDF concepts in your application, Jena provides 
> streaming parsers plus a SPARQL client as does Sesame. Each provides exactly 
> what you describe! Yes, a system that was minimal would be smaller but (1) is 
> the size difference that critical (solution - strip down a toolkit); data is 
> large, code is small and (2) in what way is it not yet-another-toolkit, and 
> all that goes with that?
> {quote}
> wikier:
> {quote}
> OK, I think now I understand @larsga's point...
> I do agree that SPARQL should be in theory such "common interface". But what 
> happens right now it that each library serializes the results using their own 
> terms. So one of the goals of commons-rdf would be to align the interfaces 
> there too.
> Of course you could always say you can be decoupled by parsing the results by 
> yourself. But that has two problems: On the one hand, you are reimplementing 
> code you do not need and probably making mistakes. On the other hand, that 
> only works if your code is not going to be used by anyone else; as soon as 
> it's going to be used, instead of solving a problem your are causing another 
> one.
> In case this helps for the discussion, we discussed the idea of commons-rdf 
> because in two following weeks I had to deal with the same problem: I needed 
> to provide a simple client library and I realized the decision I made in 
> terms of which library I chose forced people to use that library.
> Those two client libraries are the Redlink SDK and MICO. Both with different 
> purposes and different targets, but in the end dealing with the same problem.
> https://github.com/redlink-gmbh/redlink-java-sdk
> http://www.mico-project.eu/commons-rdf/
> {quote}
> larsga:
> {quote}
> Yes, this is getting closer to what I meant. As you say, a SPARQL client 
> library is fine for stuff like ASK, SELECT, INSERT and so on. The problem is 
> CONSTRUCT, or if you want to parse a file. However, even in those cases I do 
> not want an in-memory representation of the resulting RDF. I want it 
> streamed, kind of like SAX for XML. Then, if I need an in-memory 
> representation I will build one from the resulting stream.
> Now if you argue that there will be people for whom an in-memory 
> representation is the best choice I guess that's OK. But I think it's wrong 
> to force people to go via such a representation. Ideally, I'd like to see:
> *    a simple streaming interface,
> *    a simple abstraction for parsers and writers,
> *    a SPARQL client library that represents RDF as callback streams.
> If there also has to be a full-blown API with Graph, Statement, and the like, 
> so be it. But it would IMHO be best if that were layered on top of the rest 
> as an option, so that if you wanted you could build such a model by streaming 
> statements into it, but you wouldn't be forced to go via those interfaces if 
> you didn't want to.
> {quote}
> ansell:
> {quote}
> I understand that Graph is an abstraction that many people do not need, 
> particularly if they are streaming, but Statement seems to be a very useful 
> abstraction in an object oriented language, and it should be very low cost to 
> create, even if you are streaming.
> As Andy says, both Sesame and Jena currently offer streaming parsers for both 
> SPARQL Results formats and RDF formats, so your main argument right now seems 
> to be possible in practice already. The choice is just not interchangable 
> after you decide which library to use at this point, which is the reason that 
> we stopped where we did so far as the current model is at least enough to get 
> streaming parsers going.
> All parts of the API are loosely linked at this point, with a clear 
> theoretical model from RDF-1.1. Hence, you don't need to implement or use 
> Graph if you just want a streaming API that accepts Statement or a 
> combination of the available RDFTerm's.
> {quote}
> drlivingstone:
> {quote}
>  think Statement seems like something that would be essential / useful - it's 
> the smallest "functional" piece of RDF. (A use case where you want to iterate 
> over parts of a Graph response that are in units smaller than triples seems 
> weird to me - why not use a Select query then?, but anyway.) Whether Graph 
> gets its own Class/API, or whether Statement could be a (potentially 
> implicit) quad instead is probably where the different underlying libraries 
> will have differing goals.
> Regarding the goals of the library to have common abstractions / vocabulary - 
> I would bet most people using RDF are also using (at least some) SPARQL. You 
> can build a generic interface for querying and streaming through results that 
> covers both Jena and Sesame, I have done so in Clojure anyway, in my KR 
> library. This requires more than just agreeing that results are in terms of 
> the common RDFTerm class though as pointed out above, a common SPARQL API is 
> needed to agree to how tuples or graphs etc. are returned/iterated over etc. 
> But it wasn't that hard to do. Having the underling library maintainers do it 
> for me (possibly more efficiently) would have certainly been better. This 
> goes beyond the scope of just defining core RDF terms though.
> {quote}
> stain:
> {quote}
> I think the Graph concept is useful - not everyone is accessing pre-existing 
> data on a pre-existing SPARQL server. For instance, a light-weight container 
> for annotations might want to expose just a couple of Graph instances without 
> exposing the underlying RDF framework. Someone who is generating RDF as a 
> side-product can chuck their triples in a Graph and then pass it to arbitrary 
> RDF framework for serialization or going to a LOD server.
> I can see many libraries that would not use Graph, but could use the other 
> RDFTerms.
> This would be the case for OWLAPI for instance, which has Ontology as a core 
> concept rather than a graph. Operations like Graph.add() don't make much 
> sense in general there, as you have to serialize the ontology as RDF before 
> you get a graph.
> I don't think it should be a requirement for implementors to provide a Graph 
> implementation - thus RDFTermFactory.createGraph() is optional.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (COMMONSRDF-5) Is a graph API the right thing to do?

Reply via email to