[ https://issues.apache.org/jira/browse/JENA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146688#comment-17146688 ]
Julian Gonggrijp edited comment on JENA-1926 at 6/26/20, 10:54 PM: ------------------------------------------------------------------- Thank you so much [~andy] and [~rvesse] for taking the time to explain this to me. I really appreciate it. I will not debate that ARQ has a reason to behave in the way it does, but before I can let this rest, I need to be 100% clear on whether these queries are equivalent or not in a set-theoretic sense. If they are, then I think it is somewhat problematic that there is a 3x performance difference between them. If they are not, then I need to understand which version is correct for my application. ??The reason they can be non-equivalent is because in the two versions {{?body}} may be bound to a different set of values.?? I can see, given the nesting of the graph patterns and the specifics of how ARQ operates, that this may be true _at the time of the {{OPTIONAL}} pattern evaluation_. What I still cannot wrap my head around, however, is how {{?body}} may be bound to a different set of values _by the time the entire query has been evaluated_. Sure, in the fast version, the {{OPTIONAL}} pattern is tried against any {{?body}} that matches just the {{?annotation oa:hasBody ?body}} triple pattern. Initially, this will generate {{?body ?c ?d}} triples that wouldn't be included in the slow version. But surely, these superfluous triples will be filtered out again as ARQ joins the result of the {{conditional}} with the larger {{bgp}}, as this constrains the set of possible values for {{?annotation}} in the same way as in the slow version? was (Author: jgonggrijp): Thank you so much [~andy] and [~rvesse] for taking the time to explain this to me. I really appreciate it. I will not debate that ARQ has a reason to behave in the way it does, but before I can let this rest, I need to be 100% clear on whether these queries are equivalent or not in a set-theoretic sense. If they are, then I think it is somewhat problematic that there is a 3x performance difference between them. If they are not, then I need to understand which version is correct for my application. ?? The reason they can be non-equivalent is because in the two versions {{?body}} may be bound to a different set of values.?? I can see, given the nesting of the graph patterns and the specifics of how ARQ operates, that this may be true _at the time of the {{OPTIONAL}} pattern evaluation_. What I still cannot wrap my head around, however, is how {{?body}} may be bound to a different set of values _by the time the entire query has been evaluated_. Sure, in the fast version, the {{OPTIONAL}} pattern is tried against any {{?body}} that matches just the {{?annotation oa:hasBody ?body}} triple pattern. Initially, this will generate {{?body ?c ?d}} triples that wouldn't be included in the slow version. But surely, these superfluous triples will be filtered out again as ARQ joins the result of the {{conditional}} with the larger {{bgp}}, as this constrains the set of possible values for {{?annotation}} in the same way as in the slow version? > Query execution speed depends more on WHERE clause order than expected > ---------------------------------------------------------------------- > > Key: JENA-1926 > URL: https://issues.apache.org/jira/browse/JENA-1926 > Project: Apache Jena > Issue Type: Improvement > Components: ARQ > Affects Versions: Jena 3.15.0 > Reporter: Julian Gonggrijp > Priority: Minor > > The following query takes about 6.5 seconds with my dataset, which > unfortunately I cannot share. Note that {{?source}} is bound to a single IRI > in all queries below; I'm leaving that out for brevity. > {code:java} > PREFIX oa: <http://www.w3.org/ns/oa#> > PREFIX dcterms: <http://purl.org/dc/terms/> > CONSTRUCT { > ?annotation ?a ?b. > ?body ?c ?d. > ?target ?e ?f. > ?selector ?g ?h. > } WHERE { > ?annotation oa:hasBody ?body; > oa:hasTarget ?target; > dcterms:creator ?user; > ?a ?b. > ?target oa:hasSource ?source; > oa:hasSelector ?selector; > ?e ?f. > ?selector ?g ?h. > OPTIONAL { ?body ?c ?d }. > } > {code} > Compare this to the following query, which I believe is exactly equivalent > but takes only 2 seconds: > {code:java} > CONSTRUCT { > ?annotation ?a ?b. > ?body ?c ?d. > ?target ?e ?f. > ?selector ?g ?h. > } WHERE { > ?annotation oa:hasBody ?body. > OPTIONAL { ?body ?c ?d }. > ?annotation oa:hasTarget ?target; > dcterms:creator ?user; > ?a ?b. > ?target oa:hasSource ?source; > oa:hasSelector ?selector; > ?e ?f. > ?selector ?g ?h. > } > {code} > For comparison, leaving out the optional {{?body}} entirely, I get a query > that executes in 1.7 seconds: > {code:java} > CONSTRUCT { > ?annotation ?a ?b. > ?target ?e ?f. > ?selector ?g ?h. > } WHERE { > ?annotation oa:hasTarget ?target; > dcterms:creator ?user; > ?a ?b. > ?target oa:hasSource ?source; > oa:hasSelector ?selector; > ?e ?f. > ?selector ?g ?h. > } > {code} > I'm a novice to SPARQL, but coming from SQL, I wouldn't expect query > execution speed to depend so much on the order in which the criteria are > given. -- This message was sent by Atlassian Jira (v8.3.4#803005)