[
https://issues.apache.org/jira/browse/JENA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146688#comment-17146688
]
Julian Gonggrijp commented on JENA-1926:
----------------------------------------
Thank you so much [~andy] and [~rvesse] for taking the time to explain this to
me. I really appreciate it.
I will not debate that ARQ has a reason to behave in the way it does, but
before I can let this rest, I need to be 100% clear on whether these queries
are equivalent or not in a set-theoretic sense. If they are, then I think it is
somewhat problematic that there is a 3x performance difference between them. If
they are not, then I need to understand which version is correct for my
application.
?? The reason they can be non-equivalent is because in the two versions
{{?body}} may be bound to a different set of values.??
I can see, given the nesting of the graph patterns and the specifics of how ARQ
operates, that this may be true _at the time of the {{OPTIONAL}} pattern
evaluation_. What I still cannot wrap my head around, however, is how {{?body}}
may be bound to a different set of values _by the time the entire query has
been evaluated_.
Sure, in the fast version, the {{OPTIONAL}} pattern is tried against any
{{?body}} that matches just the {{?annotation oa:hasBody ?body}} triple
pattern. Initially, this will generate {{?body ?c ?d}} triples that wouldn't be
included in the slow version. But surely, these superfluous triples will be
filtered out again as ARQ joins the result of the {{conditional}} with the
larger {{bgp}}, as this constrains the set of possible values for
{{?annotation}} in the same way as in the slow version?
> Query execution speed depends more on WHERE clause order than expected
> ----------------------------------------------------------------------
>
> Key: JENA-1926
> URL: https://issues.apache.org/jira/browse/JENA-1926
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.15.0
> Reporter: Julian Gonggrijp
> Priority: Minor
>
> The following query takes about 6.5 seconds with my dataset, which
> unfortunately I cannot share. Note that {{?source}} is bound to a single IRI
> in all queries below; I'm leaving that out for brevity.
> {code:java}
> PREFIX oa: <http://www.w3.org/ns/oa#>
> PREFIX dcterms: <http://purl.org/dc/terms/>
> CONSTRUCT {
> ?annotation ?a ?b.
> ?body ?c ?d.
> ?target ?e ?f.
> ?selector ?g ?h.
> } WHERE {
> ?annotation oa:hasBody ?body;
> oa:hasTarget ?target;
> dcterms:creator ?user;
> ?a ?b.
> ?target oa:hasSource ?source;
> oa:hasSelector ?selector;
> ?e ?f.
> ?selector ?g ?h.
> OPTIONAL { ?body ?c ?d }.
> }
> {code}
> Compare this to the following query, which I believe is exactly equivalent
> but takes only 2 seconds:
> {code:java}
> CONSTRUCT {
> ?annotation ?a ?b.
> ?body ?c ?d.
> ?target ?e ?f.
> ?selector ?g ?h.
> } WHERE {
> ?annotation oa:hasBody ?body.
> OPTIONAL { ?body ?c ?d }.
> ?annotation oa:hasTarget ?target;
> dcterms:creator ?user;
> ?a ?b.
> ?target oa:hasSource ?source;
> oa:hasSelector ?selector;
> ?e ?f.
> ?selector ?g ?h.
> }
> {code}
> For comparison, leaving out the optional {{?body}} entirely, I get a query
> that executes in 1.7 seconds:
> {code:java}
> CONSTRUCT {
> ?annotation ?a ?b.
> ?target ?e ?f.
> ?selector ?g ?h.
> } WHERE {
> ?annotation oa:hasTarget ?target;
> dcterms:creator ?user;
> ?a ?b.
> ?target oa:hasSource ?source;
> oa:hasSelector ?selector;
> ?e ?f.
> ?selector ?g ?h.
> }
> {code}
> I'm a novice to SPARQL, but coming from SQL, I wouldn't expect query
> execution speed to depend so much on the order in which the criteria are
> given.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)