[ 
https://issues.apache.org/jira/browse/JENA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333491#comment-14333491
 ] 

Rob Vesse commented on JENA-885:
--------------------------------

Note that these two queries are semantically different due to different levels 
of nesting within the different variants of the query.

Some variants allow ARQ to apply its index join strategy whereas others prevent 
these.

For example the first query from your original report produces the following 
algebra:

{noformat}
(prefix ((ex: <http://example.com/>))
  (project (?s ?valueA)
    (extend ((?valueA (if (bound ?labelA) ?labelA ?a)))
      (conditional
        (table unit)
        (conditional
          (bgp (triple ?s ex:propA ?a))
          (bgp (triple ?a ex:label ?labelA)))))))
{noformat}

Whereas the second query produces the following algebra:

{noformat}
(prefix ((ex: <http://example.com/>))
  (project (?s ?valueA)
    (leftjoin
      (table unit)
      (extend ((?valueA (if (bound ?labelA) ?labelA ?a)))
        (conditional
          (bgp (triple ?s ex:propA ?a))
          (bgp (triple ?a ex:label ?labelA)))))))
{noformat}

In the first case the {{OPTIONAL}} is evaluated first using index joins which 
are much faster and then the {{BIND}} is calculated over the results.  In the 
second case the {{BIND}} is inside the {{OPTIONAL}} which appears to blocks the 
use of index joins at the outer scope of the query.

That being said looking at the high level plans it seems a bit strange that a 
{{left join}} with a {{table unit}} should produce such huge performance 
differences.

> Poor performance and timeout failure with BIND in nested OPTIONALs
> ------------------------------------------------------------------
>
>                 Key: JENA-885
>                 URL: https://issues.apache.org/jira/browse/JENA-885
>             Project: Apache Jena
>          Issue Type: Bug
>    Affects Versions: Jena 2.11.2
>            Reporter: Mark Buquor
>         Attachments: ExecuteTestQueries.java, GenerateTestDataset.java
>
>
> There appears to be a performance issue with BIND when used inside nested 
> OPTIONALs. Affected queries fail to time out.
> The following patterns appear to be affected:
> {noformat}
> OPTIONAL { ... OPTIONAL { ... BIND ( ... ) } }
> OPTIONAL { ... OPTIONAL { ... } BIND ( ... ) }
> {noformat}
> The following patterns appear to be unaffected:
> {noformat}
> OPTIONAL { ... OPTIONAL { ... } } BIND ( ... )
> OPTIONAL { ...  BIND ( ... ) }
> OPTIONAL { ... } BIND ( ... )
> {noformat}
> So far, users have been able to work around the performance issue by 
> rewriting their queries. However, the timeout failure is still a significant 
> reliability issue, since affected queries consume resources and can run 
> indefinitely. I've attached a testcase that exhibits the performance and 
> timeout problems. Reproduced with a recent 2.13.0-SNAPSHOT build.
> {noformat}
> Execution Timeout (ms): 30000
> Query: PREFIX ex: <http://example.com/> SELECT ?s ?valueA { OPTIONAL { ?s 
> ex:propA ?a . OPTIONAL { ?a ex:label ?labelA . } } BIND ( IF ( BOUND 
> (?labelA), ?labelA, ?a) as ?valueA) }
> Execution time (ms): 586
> Execution time (ms): 143
> Query: PREFIX ex: <http://example.com/> SELECT ?s ?valueA { OPTIONAL { ?s 
> ex:propA ?a . OPTIONAL { ?a ex:label ?labelA . } BIND ( IF ( BOUND (?labelA), 
> ?labelA, ?a) as ?valueA) } }
> Execution time (ms): 110922
> Execution time (ms): 41004
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to