[jira] [Commented] (JENA-1926) Query execution speed depends more on WHERE clause order than expected

Julian Gonggrijp (Jira) Fri, 26 Jun 2020 04:31:34 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146241#comment-17146241
 ]


Julian Gonggrijp commented on JENA-1926:
----------------------------------------

Thank you for the quick response, [~andy].

You pointed out that the queries do not have the same structure. Did you 
mention this just to explain how it is possible that the queries perform 
differently, or did you intend to make the stronger claim that the performance 
difference is _justified_?

I am aware that the grouping of the graph patterns is different. Adding braces 
to both versions of the query to make the group graph patterns explicit:

 
{code:java}
# slow version
{
    ?annotation oa:hasBody ?body;
                oa:hasTarget ?target;
                dcterms:creator ?user;
                ?a ?b.
    ?target oa:hasSource ?source;
            oa:hasSelector ?selector;
            ?e ?f.
    ?selector ?g ?h.
}
OPTIONAL { ?body ?c ?d }.{code}
{code:java}
# fast version
{
    { ?annotation oa:hasBody ?body }.
    OPTIONAL { ?body ?c ?d }.
}
{
    ?annotation oa:hasTarget ?target;
                dcterms:creator ?user;
                ?a ?b.
    ?target oa:hasSource ?source;
            oa:hasSelector ?selector;
            ?e ?f.
    ?selector ?g ?h.
}{code}
However, I fail to see how these queries could be nonequivalent under any 
circumstance, regardless of the specifics of my dataset. The way I'm 
understanding the SPARQL language so far, adjacent group graph patterns are 
combined by conjunction, i.e., they are both required to match the solution. 
Likewise, all triple patterns within a graph pattern are combined by 
conjunction as well. As conjunction is commutative and associative, the order 
and nesting of the triple patterns does not affect the set of solutions.

I also understand that combining group graph patterns through {{OPTIONAL}} is 
neither commutative nor associative, but in this query, the optional pattern 
involves only one triple with only one shared variable, that is shared with 
only one other triple. As long as this other triple is included in the 
preceding group, which is the case for both versions of the query, I don't see 
how the nesting and order of other triples could affect the solutions to the 
overall pattern.

You also wrote that "If {{?source}} is actually an URI, it will make a 
significant difference". I can see that it makes a massive difference wether I 
constrain this variable to a single IRI or not. This is the purpose of 
including it in the query in the first place. However, I again fail to see how 
it could make any semantic difference between the two versions of the query. 
The {{?source}} variable does not appear in the {{OPTIONAL}} group and all the 
other triples in the overall pattern are basically tied together in a big 
conjunction ball where order and nesting do not affect the solutions.

 

> Query execution speed depends more on WHERE clause order than expected
> ----------------------------------------------------------------------
>
>                 Key: JENA-1926
>                 URL: https://issues.apache.org/jira/browse/JENA-1926
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Fuseki, TDB2
>    Affects Versions: Jena 3.15.0
>            Reporter: Julian Gonggrijp
>            Priority: Minor
>
> The following query takes about 6.5 seconds with my dataset, which 
> unfortunately I cannot share. Note that {{?source}} is bound to a single IRI 
> in all queries below; I'm leaving that out for brevity.
> {code:java}
> PREFIX oa: <http://www.w3.org/ns/oa#>
> PREFIX dcterms: <http://purl.org/dc/terms/>
> CONSTRUCT {
>     ?annotation ?a ?b.
>     ?body ?c ?d.
>     ?target ?e ?f.
>     ?selector ?g ?h.
> } WHERE {
>     ?annotation oa:hasBody ?body;
>                 oa:hasTarget ?target;
>                 dcterms:creator ?user;
>                 ?a ?b.
>     ?target oa:hasSource ?source;
>             oa:hasSelector ?selector;
>             ?e ?f.
>     ?selector ?g ?h.
>     OPTIONAL { ?body ?c ?d }.
> }
> {code}
> Compare this to the following query, which I believe is exactly equivalent 
> but takes only 2 seconds:
> {code:java}
> CONSTRUCT {
>     ?annotation ?a ?b.
>     ?body ?c ?d.
>     ?target ?e ?f.
>     ?selector ?g ?h.
> } WHERE {
>     ?annotation oa:hasBody ?body.
>     OPTIONAL { ?body ?c ?d }.
>     ?annotation oa:hasTarget ?target;
>                 dcterms:creator ?user;
>                 ?a ?b.
>     ?target oa:hasSource ?source;
>             oa:hasSelector ?selector;
>             ?e ?f.
>     ?selector ?g ?h.
> }
> {code}
>  For comparison, leaving out the optional {{?body}} entirely, I get a query 
> that executes in 1.7 seconds:
> {code:java}
> CONSTRUCT {
>     ?annotation ?a ?b.
>     ?target ?e ?f.
>     ?selector ?g ?h.
> } WHERE {
>     ?annotation oa:hasTarget ?target;
>                 dcterms:creator ?user;
>                 ?a ?b.
>     ?target oa:hasSource ?source;
>             oa:hasSelector ?selector;
>             ?e ?f.
>     ?selector ?g ?h.
> }
> {code}
> I'm a novice to SPARQL, but coming from SQL, I wouldn't expect query 
> execution speed to depend so much on the order in which the criteria are 
> given.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-1926) Query execution speed depends more on WHERE clause order than expected

Reply via email to