[ 
https://issues.apache.org/jira/browse/JENA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147115#comment-17147115
 ] 

Julian Gonggrijp commented on JENA-1926:
----------------------------------------

[~andy] At first, I was convinced that the queries are equivalent, so I 
reported this issue as an apparent pitfall. When you suggested that the queries 
are not equivalent, my goal changed. Now I just want to know whether and why 
the queries are truly nonequivalent.

This new goal is due to an application that I'm developing. If the queries are 
not equivalent, one of them is probably incorrect for my application. Moreover, 
it means I cannot entirely trust myself to write correct queries until I fully 
understand the difference. This makes it important for me to get to the bottom 
of this. I wrote "in a set-theoretic sense" because I cannot make assumptions 
about my dataset; it will grow at least tenfold beyond its current size in ways 
that I cannot entirely predict at this time. The query should be fundamentally 
correct rather than accidentally correct.

If they are in fact equivalent, then it's just a matter of the optimizer being 
not smart enough, as you said before. In this case, I can move on, since I have 
a variant of the query that is fast enough for my application. I won't blame 
you for erring on the side of caution and letting the performance difference 
persist. If I were an active maintainer of Jena, I might want to document it as 
a pitfall, but this is up to you.

I recognize that you have tried to help me and I understand that there is a 
limit to the amount of time you can spend on this issue. If, for whatever 
reason, you decide to close the issue here, I will take no offense and seek 
expert advice elsewhere. When I'm wiser, I will return here to answer my own 
question in a way that my current self can understand, in case anyone else runs 
into a similar situation.

In any case, thank you for your effort so far.

> Query execution speed depends more on WHERE clause order than expected
> ----------------------------------------------------------------------
>
>                 Key: JENA-1926
>                 URL: https://issues.apache.org/jira/browse/JENA-1926
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.15.0
>            Reporter: Julian Gonggrijp
>            Priority: Minor
>
> The following query takes about 6.5 seconds with my dataset, which 
> unfortunately I cannot share. Note that {{?source}} is bound to a single IRI 
> in all queries below; I'm leaving that out for brevity.
> {code:java}
> PREFIX oa: <http://www.w3.org/ns/oa#>
> PREFIX dcterms: <http://purl.org/dc/terms/>
> CONSTRUCT {
>     ?annotation ?a ?b.
>     ?body ?c ?d.
>     ?target ?e ?f.
>     ?selector ?g ?h.
> } WHERE {
>     ?annotation oa:hasBody ?body;
>                 oa:hasTarget ?target;
>                 dcterms:creator ?user;
>                 ?a ?b.
>     ?target oa:hasSource ?source;
>             oa:hasSelector ?selector;
>             ?e ?f.
>     ?selector ?g ?h.
>     OPTIONAL { ?body ?c ?d }.
> }
> {code}
> Compare this to the following query, which I believe is exactly equivalent 
> but takes only 2 seconds:
> {code:java}
> CONSTRUCT {
>     ?annotation ?a ?b.
>     ?body ?c ?d.
>     ?target ?e ?f.
>     ?selector ?g ?h.
> } WHERE {
>     ?annotation oa:hasBody ?body.
>     OPTIONAL { ?body ?c ?d }.
>     ?annotation oa:hasTarget ?target;
>                 dcterms:creator ?user;
>                 ?a ?b.
>     ?target oa:hasSource ?source;
>             oa:hasSelector ?selector;
>             ?e ?f.
>     ?selector ?g ?h.
> }
> {code}
>  For comparison, leaving out the optional {{?body}} entirely, I get a query 
> that executes in 1.7 seconds:
> {code:java}
> CONSTRUCT {
>     ?annotation ?a ?b.
>     ?target ?e ?f.
>     ?selector ?g ?h.
> } WHERE {
>     ?annotation oa:hasTarget ?target;
>                 dcterms:creator ?user;
>                 ?a ?b.
>     ?target oa:hasSource ?source;
>             oa:hasSelector ?selector;
>             ?e ?f.
>     ?selector ?g ?h.
> }
> {code}
> I'm a novice to SPARQL, but coming from SQL, I wouldn't expect query 
> execution speed to depend so much on the order in which the criteria are 
> given.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to