Re: Clerezze RDF commons moved back to clerezza with a SPARQL Backend

Andy Seaborne Mon, 23 Mar 2015 05:05:38 -0700

On 23/03/15 10:25, Reto Gmür wrote:

Right now the API on Github says nothing about the identity and hascode of
any term. In order to have interoperable it is essential to define the
value of hashcode and the identity conditions for the rdf-terms which are
not locally scoped, i.e. for IRIs and Literals.

+1


I suggest to take the definitions from the clerezza rdf commons.


Absent active JIRA at the moment, could you email here please?

Given Peter is spending time on his implementation, this might be quiteuseful to him.


        Andy


Reto

On Mon, Mar 23, 2015 at 10:18 AM, Stian Soiland-Reyes <[email protected]>
wrote:

OK - I can see on settling BlankNode equality can take some more time
(also considering the SPARQL example).

So then we must keep the "internalIdentifier" and the abstract concept
of the "local scope" for the next release.

In which case this one should also be applied:

https://github.com/commons-rdf/commons-rdf/pull/48/files
and perhaps:
https://github.com/commons-rdf/commons-rdf/pull/61/files



I would then need to fix simple GraphImpl.add() to clone and change
the local scope of the BlankNodes:
.. as otherwise it would wrongly merge graph1.b1 and graph2.b1 (in
both having the same internalIdentifier and the abstract Local Scope
of being in the same Graph). This can happen if doing say a copy from
one graph to another.

Raised and detailed in
https://github.com/commons-rdf/commons-rdf/issues/66
.. adding this to the tests sounds crucial, and would help us later
when sorting this.


This is in no way a complete resolution. (New bugs would arise, e.g.
you could add a triple with a BlankNode and then not remove it
afterwards with the same arguments).





On 22 March 2015 at 21:00, Peter Ansell <[email protected]> wrote:

+1

Although it is not urgent to release a 1.0 version, it is urgent to
release (and keep releasing often) what we have changed since 0.0.2 so
we can start experimenting with it, particularly since I have started
more intently on Sesame 4 in the last few weeks. Stians pull requests
to change the BNode situation could wait until after 0.0.3 is
released, at this point.

Cheers,

Peter

On 21 March 2015 at 22:37, Andy Seaborne <[email protected]> wrote:

I agree with Sergio that releasing something is important.

We need to release, then independent groups can start to build on it. We
have grounded requirements and a wider community.

         Andy


On 21/03/15 09:10, Reto Gmür wrote:


Hi Sergio,

I don't see where an urgent agenda comes from. Several RDF APIs are

there

so a new API essentially needs to be better rather than done with

urgency.


The SPARQL implementation is less something that need to be part of the
first release but something that helps validating the API proposal. We
should validate our API against many possible usecases and then discus
which are more important to support. In my opinion for an RDF API it is
more important that it can be used with remote repositories over

standard

protocols than support for hadoop style processing across many machines
[1], but maybe we can support both usecases.

In any case I think its good to have prototypical implementation of
usecases to see what API features are needed and which are

problematic. So

I would encourage to write prototype usecases where a hadoop style
processing shows the need for exposed blank node ID or a prototype

showing

that that IRI is better an interface than a class, etc.

At the end we need to decide on the API features based on the usecases
they
are required by respectively compatible with. But it's hard to see the
requirements without prototypical code.

Cheers,
Reto

1.

https://github.com/commons-rdf/commons-rdf/pull/48#issuecomment-72689214


On Fri, Mar 20, 2015 at 8:30 PM, Sergio Fernández <[email protected]>
wrote:

I perfectly understand what you target. But still, FMPOV still out of

our

urgent agenda. Not because it is not interesting, just because more
urgent
things to deal with. I think the most important think is to get

running

with what we have, and get a release out. But, as I said, we can

discuss

it.


On 20/03/15 19:10, Reto Gmür wrote:

Just a little usage example to illustrate Stian's point:

public class Main {
       public static void main(String... args) {
           Graph g = new SparqlGraph("http://dbpedia.org/sparql";);
           Iterator<Triple> iter = g.filter(new Iri("
http://dbpedia.org/ontology/Planet";),
                   new
Iri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type
"),
null);
           while (iter.hasNext()) {
               System.out.println(iter.next().getObject());
           }
       }
}

I think with Stian's version using streams the above could be shorter
and
nicer. But the important part is that the above allows to use

dbpedia as

a
graph without worrying about sparql.

Cheers,
Reto

On Fri, Mar 20, 2015 at 4:16 PM, Stian Soiland-Reyes <

[email protected]>

wrote:

   I think a query interface as you say is orthogonal to Reto's


impl.sparql module - which is trying to be an implementation of RDF
Commons that is backed only by a remote SPARQL endpoint.  Thus it
touches on important edges like streaming and blank node identities.

It's not a SPARQL endpoint backed by RDF Commons! :-)



On 20 March 2015 at 10:58, Sergio Fernández <[email protected]>

wrote:

Hi Reto,

yes, that was a deliberated decision on early phases. I'd need to

look

it
up, I do not remember the concrete issue.

Just going a bit deeper into the topic, in querying we are talking

not

only

about providing native support to query Graph instance, but also to

provide

common interfaces to interact with the results.

The idea was to keep the focus on RDF 1.1 concepts before moving to

query.

Personally I'd prefer to keep that scope for the first incubator
release,
and then start to open discussions about such kind of threads. But

of

course

we can vote to change that approach.

Cheers,



On 17/03/15 11:05, Reto Gmür wrote:


Hi Sergio,

I'm not sure which deliberate decision you are referring to, is it
Issue
#35 in Github?

Anyway, the impl.sparql code is not about extending the API to

allow

running queries on a graph, in fact the API isn't extended at all.
It's

an

implementation of the API which is backed by a SPARQL endpoint.

Very

often

the triple store doesn't run in the same VM as the client and so

it is


necessary that implementation of the API speak to a remote triple
store.
This can use some proprietary protocols or standard SPARQL, this

is

an
implementation for SPARQL and can thus be used against any SPARQL
endpoint.

Cheers,
Reto




On Tue, Mar 17, 2015 at 7:41 AM, Sergio Fernández <

[email protected]>

wrote:

   Hi Reto,



thanks for updating us with the status from Clerezza.

In the current Commons RDF API we delivery skipped querying for

the

early

versions.



Although I'd prefer to keep this approach in the initial steps at
ASF

(I

hope we can import the code soon...), that's for sure one of the

next


points to discuss in the project, where all that experience is

valuable.

Cheers,

On 16/03/15 13:02, Reto Gmür wrote:

   Hello,



With the new repository the clerezza rdf commons previously in

the

commons
sandbox are now at:

https://git-wip-us.apache.org/repos/asf/clerezza-rdf-core.git

I will compare that code with the current status of the code in

the

incubating rdf-commons project in a later mail.

Now I would like to point to your attention a big step forward
towards
CLEREZZA-856. The impl.sparql modules provide an implementation

of

the
API
on top of a SPARQL endpoint. Currently it only supports read
access.

For

usage example see the tests in


/src/test/java/org/apache/commons/rdf/impl/sparql (
https://git-wip-us.apache.org/repos/asf?p=clerezza-rdf-core.
git;a=tree;f=impl.sparql/src/test/java/org/apache/commons/

rdf/impl/sparql;h=cb9c98bcf427452392e74cd162c08ab308359c13;hb=HEAD

)

The hard part was supporting BlankNodes. The current

implementation

handles
them correctly even in tricky situations, however the current

code

is
not
optimized for performance yet. As soon as BlankNodes are

involved

many
queries have to be sent to the backend. I'm sure some SPARQL

wizard

could
help making things more efficient.

Since SPARQL is the only standardized methods to query RDF

data, I

think

being able to façade an RDF Graph accessible via SPARQL is an

important

usecase for an RDF API, so it would be good to also have an SPARQL


backed
implementation of the API proposal in the incubating commons-rdf
repository.

Cheers,
Reto


   --


Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: [email protected]
w: http://redlink.co

--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: [email protected]
w: http://redlink.co




--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: [email protected]
w: http://redlink.co




--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Clerezze RDF commons moved back to clerezza with a SPARQL Backend

Reply via email to