[
https://issues.apache.org/jira/browse/JENA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186988#comment-17186988
]
Claus Stadler commented on JENA-1945:
-------------------------------------
So there are actually two slightly different use cases here for transforming
triple patterns to quads;
- (a) The current way it is done: Have all triple patterns of the original
query match within any single named graph instead.
This is useful if ones knows that all data is in the same named graph but
one does not bother/know about which one it is.
- (b) Turn each triple pattern (s p o) into a quad ([] s p o) with a fresh
variable/bnode in the graph component - and ignore duplicates
This is probably good enough in practice. Also, if I am not mistaken then
this is essentially what Virtuoso does. However, the virtual graph would not be
a standard RDF graph due to the duplicates; i.e. the result of {code}SELECT
(COUNT(*) AS ?c) { ?s ?p ?o }{code} could be different from {code}SELECT
(COUNT(DISTINCT *) AS ?c) { ?s ?p ?o }{code}
- (c) Fully rewrite each triple pattern such that it is evaluated over a unique
set of triples derived from a set of named graphs.
For example, a triple pattern with 3 variables would rewrite as:
{code}
(?s ?p ?o) => OpDistinct(OpProject([?s ?p ?o], OpQuad(?g ?s ?p ?o)))
{code}
A triple pattern with a constant rewrites as follows:
{code}
(?s rdfs:label ?o) =>
OpProject([?s ?o],
OpDistinct(
OpProject([?s ?p ?o],
OpFilter(?p = rdfs:label,
OpQuad(?g ?s ?p ?o)))))
{code}
which corresponds to
{code}
SELECT ?s ?o { SELECT DISTINCT ?s ?p ?o { GRAPH ?g { ?s ?p ?o } FILTER(?p
= rdfs:label) } }
{code}
This would give a real virtual union graph, but it looks like the
performance impact would be drastic.
While joins of distincts can be optimized( * ) the outer-most projection
prevents it: After OpDistinct we know the cardinality of every binding is 1,
but after a projection that hides one or more variables from the bindings we
lose that information.
So personally I find algebraic transformations for options (a) and (b) very
useful.
Option (c) is probably more interesting from an academic perspective in how
well existing systems can cope with that.
As I have a personal use case for this (and I am not totally unfamiliar with
Jena's algebraic transformation system) I could look into contributing an
implementation of one or more of these options if you consider it/them useful.
( * ) If I am not mistaken the following is semantically equivalent under bag
semantics:
JOIN(DISTINCT(O1), DISTINCT(O2)) => DISTINCT(JOIN(O1, O2)).
Sketch based on the definitions of "The multiset semantics of SPARQL patterns
by R Angles et al, ISWC2016" page 7:
- The cardinality of a binding after applying DISTINCT is trivially 1
- Only compatible bindings remain after a JOIN and the cardinalities of the
bindings that are flattened into the new one are multiplied. Hence, if distinct
was applied before on both arguments of the join then the cardinality of every
new binding created from compatible ones must be the product of 1 * 1 which is
1.
> Algebra.unionDefaultGraph: OpPath not handled
> ---------------------------------------------
>
> Key: JENA-1945
> URL: https://issues.apache.org/jira/browse/JENA-1945
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.16.0
> Reporter: Claus Stadler
> Priority: Major
>
>
> {code:java}
> System.out.println(
> Algebra.unionDefaultGraph(
> Algebra.compile(
> QueryFactory.create("SELECT * { ?s <urn:p> ?o }"))));
> /* Yields correct result:
> (distinct
> (graph ??_
> (bgp (triple ?s <urn:p> ?o))))
> */
> System.out.println(
> Algebra.unionDefaultGraph(
> Algebra.compile(
> QueryFactory.create("SELECT * { ?s <urn:p1>/<urn:p2>/<urn:p3> ?o }"))));
> /* Yields incorrect result because wrapping with graph ??_ (and distinct) is
> missing:
> (path ?s (seq (seq <urn:p1> <urn:p2>) <urn:p3>) ?o)
> */
> {code}
>
> It seems
> [TransformUnionQuery.java|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/algebra/TransformUnionQuery.java#L34]
> lacks the handling of (at least) OpPath
--
This message was sent by Atlassian Jira
(v8.3.4#803005)