[ 
https://issues.apache.org/jira/browse/JENA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186988#comment-17186988
 ] 

Claus Stadler commented on JENA-1945:
-------------------------------------

So there are actually two slightly different use cases here for transforming 
triple patterns to quads;

- (a) The current way it is done: Have all triple patterns of the original 
query match within any single named graph instead.
      This is useful if ones knows that all data is in the same named graph but 
one does not bother/know about which one it is.
- (b) Turn each triple pattern (s p o)  into a quad ([] s p o) with a fresh 
variable/bnode in the graph component - and ignore duplicates
      This is probably good enough in practice. Also, if I am not mistaken then 
this is essentially what Virtuoso does. However, the virtual graph would not be 
a standard RDF graph due to the duplicates; i.e. the result of {code}SELECT 
(COUNT(*) AS ?c) { ?s ?p ?o }{code} could be different from {code}SELECT 
(COUNT(DISTINCT *) AS ?c) { ?s ?p ?o }{code}
- (c) Fully rewrite each triple pattern such that it is evaluated over a unique 
set of triples derived from a set of named graphs.
     For example, a triple pattern with 3 variables would rewrite as:
{code}
     (?s ?p ?o) => OpDistinct(OpProject([?s ?p ?o], OpQuad(?g ?s ?p ?o)))
{code}

     A triple pattern with a constant rewrites as follows:
{code}
     (?s rdfs:label ?o) =>
       OpProject([?s ?o],
         OpDistinct(
            OpProject([?s ?p ?o],
              OpFilter(?p = rdfs:label,
                 OpQuad(?g ?s ?p ?o)))))
{code}

      which corresponds to
{code}
      SELECT ?s ?o { SELECT DISTINCT ?s ?p ?o { GRAPH ?g { ?s ?p ?o } FILTER(?p 
= rdfs:label) } }
{code}
      This would give a real virtual union graph, but it looks like the 
performance impact would be drastic.


While joins of distincts can be optimized( * ) the outer-most projection 
prevents it: After OpDistinct we know the cardinality of every binding is 1, 
but after a projection that hides one or more variables from the bindings we 
lose that information.


So personally I find algebraic transformations for options (a) and (b) very 
useful.
Option (c) is probably more interesting from an academic perspective in how 
well existing systems can cope with that.

As I have a personal use case for this (and I am not totally unfamiliar with 
Jena's algebraic transformation system) I could look into contributing an 
implementation of one or more of these options if you consider it/them useful.



( * ) If I am not mistaken the following is semantically equivalent under bag 
semantics:

JOIN(DISTINCT(O1), DISTINCT(O2)) => DISTINCT(JOIN(O1, O2)).
Sketch based on the definitions of "The multiset semantics of SPARQL patterns 
by R Angles et al, ISWC2016" page 7:
  - The cardinality of a binding after applying DISTINCT is trivially 1
  - Only compatible bindings remain after a JOIN and the cardinalities of the 
bindings that are flattened into the new one are multiplied. Hence, if distinct 
was applied before on both arguments of the join then the cardinality of every 
new binding created from compatible ones must be the product of 1 * 1 which is 
1.


> Algebra.unionDefaultGraph: OpPath not handled
> ---------------------------------------------
>
>                 Key: JENA-1945
>                 URL: https://issues.apache.org/jira/browse/JENA-1945
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.16.0
>            Reporter: Claus Stadler
>            Priority: Major
>
>  
> {code:java}
> System.out.println(
>   Algebra.unionDefaultGraph(
>     Algebra.compile(
>       QueryFactory.create("SELECT * { ?s <urn:p> ?o }"))));
> /* Yields correct result:
> (distinct
>   (graph ??_
>     (bgp (triple ?s <urn:p> ?o))))      
> */
> System.out.println(
>   Algebra.unionDefaultGraph(
>     Algebra.compile(
>       QueryFactory.create("SELECT * { ?s <urn:p1>/<urn:p2>/<urn:p3> ?o }"))));
> /* Yields incorrect result because wrapping with graph ??_ (and distinct) is 
> missing:
> (path ?s (seq (seq <urn:p1> <urn:p2>) <urn:p3>) ?o)
> */
> {code}
>  
> It seems 
> [TransformUnionQuery.java|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/algebra/TransformUnionQuery.java#L34]
>  lacks the handling of (at least) OpPath



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to