> -----Original Message-----
> From: Peter Ansell [mailto:[EMAIL PROTECTED]
> Sent: 22 November 2008 21:54
> To: Kingsley Idehen
> Cc: [email protected]; virtuoso-
> [EMAIL PROTECTED]
> Subject: Re: [Dbpedia-discussion] DBPedia 3.2 Load in Virtuoso 5.0.9 -
> Reporting on results, and some questions
> > Duplicates!
> > Can someone please explain this?
> >
> > As a side, when I run this from isql on my newly locally installed dbpedia
> I get no duplicates (I havent tried Jena with my local).
> >
> >
> > <eom>
> >
> >
Kingsley wrote:
> Marvin,
>
> You will see why when you run:
>
> select *
> where {graph ?g {
> ?s
> <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }}
>
> As you can see their are two graphs:
> 1. http://dbpedia.org
> 2. http://dbpedia.org/resource/<entity> (this one results from cache
> activity associated with client interactions with Virtuoso)
>
> Solutions:
> -- Being specific about source Graph by specifying Graph IRI
> select ?s
> where {graph <http://dbpedia.org> {
> ?s
> <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }}
> OR
>
> select ?s
> from <http://dbpedia.org>
> where {
> ?s
> <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }
> -- Using DISTINCT
>
> select distinct ?s
> where {
> ?s
> <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }
>
Peter wrote:
> What is the instruction to give with Jena/Other clients etc. to make it
> behave in the same way as the HTTP SPARQL page interface and not resolve
> triples from the cache graphs.
For Jena, when a call of:
qexec = QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", q);
is made, the query is passed as-is to the SPARQL endpoint. The result set
comes back as SPARQL results Format and is parsed to produce the local
programming objects. There no additional process client-side. Duplicates
should not come back from that pattern but the client-side code does not check
that the endpoint is functioning correctly.
In SPARQL, matching a basic graph pattern or a triple pattern and one variable
does not give duplicates because an RDF graph is a set of triples. (It is only
possible if the pattern includes a blank node - think of that as a variable
that is projected away and like an projection, can result in duplicates across
the narrower intermediate result).
If a union of other graphs are underlying the virtual graph then the compound
graph should still appear to be a set of statements which will not produce
duplicates. By just passing over the query as-is, there's an assumption the
endpoint will respect those semantics
It would requite changing the query to suppress duplicates, e.g. using DISTINCT.
In Jena this happens in quite a few places: we have union graphs, and the
inference engines would produce duplicates if they didn't suppress them. The
storage layers SDB and TDB [*] both support query over the union of named
graphs in an RDF datasets and both suppress duplicates that occur to give the
set-of-triples view.)
Andy
[*] In the SVN only. It didn't make the last release.
>
> Cheers,
>
> Peter
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion