Re: Clerezze RDF commons moved back to clerezza with a SPARQL Backend

Stian Soiland-Reyes Mon, 23 Mar 2015 03:44:15 -0700

Could you raise this as an issue so we can focus the discussion?

Jira has not been created yet (
https://issues.apache.org/jira/browse/INFRA-9245 )  - but I assume we
would import the github issues one way or another?




On 23 March 2015 at 10:25, Reto Gmür <[email protected]> wrote:
> Right now the API on Github says nothing about the identity and hascode of
> any term. In order to have interoperable it is essential to define the
> value of hashcode and the identity conditions for the rdf-terms which are
> not locally scoped, i.e. for IRIs and Literals.
>
> I suggest to take the definitions from the clerezza rdf commons.
>
> Reto
>
> On Mon, Mar 23, 2015 at 10:18 AM, Stian Soiland-Reyes <[email protected]>
> wrote:
>
>> OK - I can see on settling BlankNode equality can take some more time
>> (also considering the SPARQL example).
>>
>> So then we must keep the "internalIdentifier" and the abstract concept
>> of the "local scope" for the next release.
>>
>> In which case this one should also be applied:
>>
>> https://github.com/commons-rdf/commons-rdf/pull/48/files
>> and perhaps:
>> https://github.com/commons-rdf/commons-rdf/pull/61/files
>>
>>
>>
>> I would then need to fix simple GraphImpl.add() to clone and change
>> the local scope of the BlankNodes:
>> .. as otherwise it would wrongly merge graph1.b1 and graph2.b1 (in
>> both having the same internalIdentifier and the abstract Local Scope
>> of being in the same Graph). This can happen if doing say a copy from
>> one graph to another.
>>
>> Raised and detailed in
>> https://github.com/commons-rdf/commons-rdf/issues/66
>> .. adding this to the tests sounds crucial, and would help us later
>> when sorting this.
>>
>>
>> This is in no way a complete resolution. (New bugs would arise, e.g.
>> you could add a triple with a BlankNode and then not remove it
>> afterwards with the same arguments).
>>
>>
>>
>>
>>
>> On 22 March 2015 at 21:00, Peter Ansell <[email protected]> wrote:
>> > +1
>> >
>> > Although it is not urgent to release a 1.0 version, it is urgent to
>> > release (and keep releasing often) what we have changed since 0.0.2 so
>> > we can start experimenting with it, particularly since I have started
>> > more intently on Sesame 4 in the last few weeks. Stians pull requests
>> > to change the BNode situation could wait until after 0.0.3 is
>> > released, at this point.
>> >
>> > Cheers,
>> >
>> > Peter
>> >
>> > On 21 March 2015 at 22:37, Andy Seaborne <[email protected]> wrote:
>> >> I agree with Sergio that releasing something is important.
>> >>
>> >> We need to release, then independent groups can start to build on it. We
>> >> have grounded requirements and a wider community.
>> >>
>> >>         Andy
>> >>
>> >>
>> >> On 21/03/15 09:10, Reto Gmür wrote:
>> >>>
>> >>> Hi Sergio,
>> >>>
>> >>> I don't see where an urgent agenda comes from. Several RDF APIs are
>> there
>> >>> so a new API essentially needs to be better rather than done with
>> urgency.
>> >>>
>> >>> The SPARQL implementation is less something that need to be part of the
>> >>> first release but something that helps validating the API proposal. We
>> >>> should validate our API against many possible usecases and then discus
>> >>> which are more important to support. In my opinion for an RDF API it is
>> >>> more important that it can be used with remote repositories over
>> standard
>> >>> protocols than support for hadoop style processing across many machines
>> >>> [1], but maybe we can support both usecases.
>> >>>
>> >>> In any case I think its good to have prototypical implementation of
>> >>> usecases to see what API features are needed and which are
>> problematic. So
>> >>> I would encourage to write prototype usecases where a hadoop style
>> >>> processing shows the need for exposed blank node ID or a prototype
>> showing
>> >>> that that IRI is better an interface than a class, etc.
>> >>>
>> >>> At the end we need to decide on the API features based on the usecases
>> >>> they
>> >>> are required by respectively compatible with. But it's hard to see the
>> >>> requirements without prototypical code.
>> >>>
>> >>> Cheers,
>> >>> Reto
>> >>>
>> >>> 1.
>> >>>
>> https://github.com/commons-rdf/commons-rdf/pull/48#issuecomment-72689214
>> >>>
>> >>> On Fri, Mar 20, 2015 at 8:30 PM, Sergio Fernández <[email protected]>
>> >>> wrote:
>> >>>
>> >>>> I perfectly understand what you target. But still, FMPOV still out of
>> our
>> >>>> urgent agenda. Not because it is not interesting, just because more
>> >>>> urgent
>> >>>> things to deal with. I think the most important think is to get
>> running
>> >>>> with what we have, and get a release out. But, as I said, we can
>> discuss
>> >>>> it.
>> >>>>
>> >>>>
>> >>>> On 20/03/15 19:10, Reto Gmür wrote:
>> >>>>
>> >>>>> Just a little usage example to illustrate Stian's point:
>> >>>>>
>> >>>>> public class Main {
>> >>>>>       public static void main(String... args) {
>> >>>>>           Graph g = new SparqlGraph("http://dbpedia.org/sparql";);
>> >>>>>           Iterator<Triple> iter = g.filter(new Iri("
>> >>>>> http://dbpedia.org/ontology/Planet";),
>> >>>>>                   new
>> >>>>> Iri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>> >>>>> "),
>> >>>>> null);
>> >>>>>           while (iter.hasNext()) {
>> >>>>>               System.out.println(iter.next().getObject());
>> >>>>>           }
>> >>>>>       }
>> >>>>> }
>> >>>>>
>> >>>>> I think with Stian's version using streams the above could be shorter
>> >>>>> and
>> >>>>> nicer. But the important part is that the above allows to use
>> dbpedia as
>> >>>>> a
>> >>>>> graph without worrying about sparql.
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Reto
>> >>>>>
>> >>>>> On Fri, Mar 20, 2015 at 4:16 PM, Stian Soiland-Reyes <
>> [email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>   I think a query interface as you say is orthogonal to Reto's
>> >>>>>>
>> >>>>>> impl.sparql module - which is trying to be an implementation of RDF
>> >>>>>> Commons that is backed only by a remote SPARQL endpoint.  Thus it
>> >>>>>> touches on important edges like streaming and blank node identities.
>> >>>>>>
>> >>>>>> It's not a SPARQL endpoint backed by RDF Commons! :-)
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On 20 March 2015 at 10:58, Sergio Fernández <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>>> Hi Reto,
>> >>>>>>>
>> >>>>>>> yes, that was a deliberated decision on early phases. I'd need to
>> look
>> >>>>>>> it
>> >>>>>>> up, I do not remember the concrete issue.
>> >>>>>>>
>> >>>>>>> Just going a bit deeper into the topic, in querying we are talking
>> not
>> >>>>>>>
>> >>>>>> only
>> >>>>>>
>> >>>>>>> about providing native support to query Graph instance, but also to
>> >>>>>>>
>> >>>>>> provide
>> >>>>>>
>> >>>>>>> common interfaces to interact with the results.
>> >>>>>>>
>> >>>>>>> The idea was to keep the focus on RDF 1.1 concepts before moving to
>> >>>>>>>
>> >>>>>> query.
>> >>>>>>
>> >>>>>>> Personally I'd prefer to keep that scope for the first incubator
>> >>>>>>> release,
>> >>>>>>> and then start to open discussions about such kind of threads. But
>> of
>> >>>>>>>
>> >>>>>> course
>> >>>>>>
>> >>>>>>> we can vote to change that approach.
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 17/03/15 11:05, Reto Gmür wrote:
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Hi Sergio,
>> >>>>>>>>
>> >>>>>>>> I'm not sure which deliberate decision you are referring to, is it
>> >>>>>>>> Issue
>> >>>>>>>> #35 in Github?
>> >>>>>>>>
>> >>>>>>>> Anyway, the impl.sparql code is not about extending the API to
>> allow
>> >>>>>>>> running queries on a graph, in fact the API isn't extended at all.
>> >>>>>>>> It's
>> >>>>>>>>
>> >>>>>>> an
>> >>>>>>
>> >>>>>>
>> >>>>>>> implementation of the API which is backed by a SPARQL endpoint.
>> Very
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> often
>> >>>>>>
>> >>>>>>
>> >>>>>>> the triple store doesn't run in the same VM as the client and so
>> it is
>> >>>>>>>>
>> >>>>>>>> necessary that implementation of the API speak to a remote triple
>> >>>>>>>> store.
>> >>>>>>>> This can use some proprietary protocols or standard SPARQL, this
>> is
>> >>>>>>>> an
>> >>>>>>>> implementation for SPARQL and can thus be used against any SPARQL
>> >>>>>>>> endpoint.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Reto
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Tue, Mar 17, 2015 at 7:41 AM, Sergio Fernández <
>> [email protected]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>   Hi Reto,
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> thanks for updating us with the status from Clerezza.
>> >>>>>>>>>
>> >>>>>>>>> In the current Commons RDF API we delivery skipped querying for
>> the
>> >>>>>>>>>
>> >>>>>>>> early
>> >>>>>>
>> >>>>>>
>> >>>>>>> versions.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Although I'd prefer to keep this approach in the initial steps at
>> >>>>>>>>> ASF
>> >>>>>>>>>
>> >>>>>>>> (I
>> >>>>>>
>> >>>>>>
>> >>>>>>> hope we can import the code soon...), that's for sure one of the
>> next
>> >>>>>>>>>
>> >>>>>>>>> points to discuss in the project, where all that experience is
>> >>>>>>>>>
>> >>>>>>>> valuable.
>> >>>>>>
>> >>>>>>
>> >>>>>>>
>> >>>>>>>>> Cheers,
>> >>>>>>>>>
>> >>>>>>>>> On 16/03/15 13:02, Reto Gmür wrote:
>> >>>>>>>>>
>> >>>>>>>>>   Hello,
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> With the new repository the clerezza rdf commons previously in
>> the
>> >>>>>>>>>> commons
>> >>>>>>>>>> sandbox are now at:
>> >>>>>>>>>>
>> >>>>>>>>>> https://git-wip-us.apache.org/repos/asf/clerezza-rdf-core.git
>> >>>>>>>>>>
>> >>>>>>>>>> I will compare that code with the current status of the code in
>> the
>> >>>>>>>>>> incubating rdf-commons project in a later mail.
>> >>>>>>>>>>
>> >>>>>>>>>> Now I would like to point to your attention a big step forward
>> >>>>>>>>>> towards
>> >>>>>>>>>> CLEREZZA-856. The impl.sparql modules provide an implementation
>> of
>> >>>>>>>>>> the
>> >>>>>>>>>> API
>> >>>>>>>>>> on top of a SPARQL endpoint. Currently it only supports read
>> >>>>>>>>>> access.
>> >>>>>>>>>>
>> >>>>>>>>> For
>> >>>>>>
>> >>>>>>
>> >>>>>>> usage example see the tests in
>> >>>>>>>>>>
>> >>>>>>>>>> /src/test/java/org/apache/commons/rdf/impl/sparql (
>> >>>>>>>>>> https://git-wip-us.apache.org/repos/asf?p=clerezza-rdf-core.
>> >>>>>>>>>> git;a=tree;f=impl.sparql/src/test/java/org/apache/commons/
>> >>>>>>>>>>
>> rdf/impl/sparql;h=cb9c98bcf427452392e74cd162c08ab308359c13;hb=HEAD
>> >>>>>>>>>> )
>> >>>>>>>>>>
>> >>>>>>>>>> The hard part was supporting BlankNodes. The current
>> implementation
>> >>>>>>>>>> handles
>> >>>>>>>>>> them correctly even in tricky situations, however the current
>> code
>> >>>>>>>>>> is
>> >>>>>>>>>> not
>> >>>>>>>>>> optimized for performance yet. As soon as BlankNodes are
>> involved
>> >>>>>>>>>> many
>> >>>>>>>>>> queries have to be sent to the backend. I'm sure some SPARQL
>> wizard
>> >>>>>>>>>> could
>> >>>>>>>>>> help making things more efficient.
>> >>>>>>>>>>
>> >>>>>>>>>> Since SPARQL is the only standardized methods to query RDF
>> data, I
>> >>>>>>>>>>
>> >>>>>>>>> think
>> >>>>>>
>> >>>>>>
>> >>>>>>> being able to façade an RDF Graph accessible via SPARQL is an
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>> important
>> >>>>>>
>> >>>>>>
>> >>>>>>> usecase for an RDF API, so it would be good to also have an SPARQL
>> >>>>>>>>>>
>> >>>>>>>>>> backed
>> >>>>>>>>>> implementation of the API proposal in the incubating commons-rdf
>> >>>>>>>>>> repository.
>> >>>>>>>>>>
>> >>>>>>>>>> Cheers,
>> >>>>>>>>>> Reto
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>   --
>> >>>>>>>>>
>> >>>>>>>>> Sergio Fernández
>> >>>>>>>>> Partner Technology Manager
>> >>>>>>>>> Redlink GmbH
>> >>>>>>>>> m: +43 660 2747 925
>> >>>>>>>>> e: [email protected]
>> >>>>>>>>> w: http://redlink.co
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>> --
>> >>>>>>> Sergio Fernández
>> >>>>>>> Partner Technology Manager
>> >>>>>>> Redlink GmbH
>> >>>>>>> m: +43 660 2747 925
>> >>>>>>> e: [email protected]
>> >>>>>>> w: http://redlink.co
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Stian Soiland-Reyes
>> >>>>>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> >>>>>> http://orcid.org/0000-0001-9842-9718
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>> --
>> >>>> Sergio Fernández
>> >>>> Partner Technology Manager
>> >>>> Redlink GmbH
>> >>>> m: +43 660 2747 925
>> >>>> e: [email protected]
>> >>>> w: http://redlink.co
>> >>>>
>> >>>
>> >>
>>
>>
>>
>> --
>> Stian Soiland-Reyes
>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> http://orcid.org/0000-0001-9842-9718
>>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Clerezze RDF commons moved back to clerezza with a SPARQL Backend

Reply via email to