[
https://issues.apache.org/jira/browse/JENA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146241#comment-17146241
]
Julian Gonggrijp commented on JENA-1926:
----------------------------------------
Thank you for the quick response, [~andy].
You pointed out that the queries do not have the same structure. Did you
mention this just to explain how it is possible that the queries perform
differently, or did you intend to make the stronger claim that the performance
difference is _justified_?
I am aware that the grouping of the graph patterns is different. Adding braces
to both versions of the query to make the group graph patterns explicit:
{code:java}
# slow version
{
?annotation oa:hasBody ?body;
oa:hasTarget ?target;
dcterms:creator ?user;
?a ?b.
?target oa:hasSource ?source;
oa:hasSelector ?selector;
?e ?f.
?selector ?g ?h.
}
OPTIONAL { ?body ?c ?d }.{code}
{code:java}
# fast version
{
{ ?annotation oa:hasBody ?body }.
OPTIONAL { ?body ?c ?d }.
}
{
?annotation oa:hasTarget ?target;
dcterms:creator ?user;
?a ?b.
?target oa:hasSource ?source;
oa:hasSelector ?selector;
?e ?f.
?selector ?g ?h.
}{code}
However, I fail to see how these queries could be nonequivalent under any
circumstance, regardless of the specifics of my dataset. The way I'm
understanding the SPARQL language so far, adjacent group graph patterns are
combined by conjunction, i.e., they are both required to match the solution.
Likewise, all triple patterns within a graph pattern are combined by
conjunction as well. As conjunction is commutative and associative, the order
and nesting of the triple patterns does not affect the set of solutions.
I also understand that combining group graph patterns through {{OPTIONAL}} is
neither commutative nor associative, but in this query, the optional pattern
involves only one triple with only one shared variable, that is shared with
only one other triple. As long as this other triple is included in the
preceding group, which is the case for both versions of the query, I don't see
how the nesting and order of other triples could affect the solutions to the
overall pattern.
You also wrote that "If {{?source}} is actually an URI, it will make a
significant difference". I can see that it makes a massive difference wether I
constrain this variable to a single IRI or not. This is the purpose of
including it in the query in the first place. However, I again fail to see how
it could make any semantic difference between the two versions of the query.
The {{?source}} variable does not appear in the {{OPTIONAL}} group and all the
other triples in the overall pattern are basically tied together in a big
conjunction ball where order and nesting do not affect the solutions.
> Query execution speed depends more on WHERE clause order than expected
> ----------------------------------------------------------------------
>
> Key: JENA-1926
> URL: https://issues.apache.org/jira/browse/JENA-1926
> Project: Apache Jena
> Issue Type: Improvement
> Components: Fuseki, TDB2
> Affects Versions: Jena 3.15.0
> Reporter: Julian Gonggrijp
> Priority: Minor
>
> The following query takes about 6.5 seconds with my dataset, which
> unfortunately I cannot share. Note that {{?source}} is bound to a single IRI
> in all queries below; I'm leaving that out for brevity.
> {code:java}
> PREFIX oa: <http://www.w3.org/ns/oa#>
> PREFIX dcterms: <http://purl.org/dc/terms/>
> CONSTRUCT {
> ?annotation ?a ?b.
> ?body ?c ?d.
> ?target ?e ?f.
> ?selector ?g ?h.
> } WHERE {
> ?annotation oa:hasBody ?body;
> oa:hasTarget ?target;
> dcterms:creator ?user;
> ?a ?b.
> ?target oa:hasSource ?source;
> oa:hasSelector ?selector;
> ?e ?f.
> ?selector ?g ?h.
> OPTIONAL { ?body ?c ?d }.
> }
> {code}
> Compare this to the following query, which I believe is exactly equivalent
> but takes only 2 seconds:
> {code:java}
> CONSTRUCT {
> ?annotation ?a ?b.
> ?body ?c ?d.
> ?target ?e ?f.
> ?selector ?g ?h.
> } WHERE {
> ?annotation oa:hasBody ?body.
> OPTIONAL { ?body ?c ?d }.
> ?annotation oa:hasTarget ?target;
> dcterms:creator ?user;
> ?a ?b.
> ?target oa:hasSource ?source;
> oa:hasSelector ?selector;
> ?e ?f.
> ?selector ?g ?h.
> }
> {code}
> For comparison, leaving out the optional {{?body}} entirely, I get a query
> that executes in 1.7 seconds:
> {code:java}
> CONSTRUCT {
> ?annotation ?a ?b.
> ?target ?e ?f.
> ?selector ?g ?h.
> } WHERE {
> ?annotation oa:hasTarget ?target;
> dcterms:creator ?user;
> ?a ?b.
> ?target oa:hasSource ?source;
> oa:hasSelector ?selector;
> ?e ?f.
> ?selector ?g ?h.
> }
> {code}
> I'm a novice to SPARQL, but coming from SQL, I wouldn't expect query
> execution speed to depend so much on the order in which the criteria are
> given.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)