[
https://issues.apache.org/jira/browse/JENA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346892#comment-14346892
]
Andy Seaborne commented on JENA-885:
------------------------------------
Partial analysis. The queries may look similar but they exhibit different
execution possibilities so picking one for discussion:
{noformat}
PREFIX ex: <http://example.com/>
SELECT ?s ?valueA
WHERE
{
OPTIONAL
{ ?s ex:propA ?a
OPTIONAL
{ ?a ex:label ?labelA}
BIND(if(bound(?labelA), ?labelA, ?a) AS ?valueA)
}
} LIMIT 1000
{noformat}
optimized algebra
{noformat}
(slice _ 1000
(project (?s ?valueA)
(leftjoin
(bgp (triple ?s <http://example.com/propA> ?a))
(extend ((?valueA (if (bound ?labelA) ?labelA ?a)))
(conditional
(bgp (triple ?s <http://example.com/propA> ?a))
(bgp (triple ?a <http://example.com/label> ?labelA)))))))
{noformat}
# The query is being executed bottom up at the top level (i.e. {{leftjoin}} -
it does not need to be in this case although in other of the queries it might
be necessary.
# The timeout is missed - looks like a separate issue to efficient execution.
# The cost comes from the fact the {{limit}} is not moved inwards so the
{{conditional}} is 1048576 rows of evaluation where it need only be 1000. This
is copied with Java's unhelpful slow growth of {{ArrayList}}.
# Excessive execution is made worse by JENA-801 (this is an effect - not a
cause)
# The top level {{leftjoin}} can be made a lot faster in this specific case.
Unclear about the general case within the current framework though there is a
separate eval engine "quack" which is worth trying out for this. Non-issue if
the limit is placed better in alegrba or execution.
I was using a reduced timeout - on my machine with an SSD it executed in 18s
worse cold case.
Missing timeout : the internal handling of bottom up execution is not all done
with {{QueryIterator}}s so the timeout mechanism isn't checked during some
internal operations. Fix : use QueryIterators.
This algebra with an additional placed {{slice}} is fast:
{noformat}
(slice _ 1000
(project (?s ?valueA)
(leftjoin
(table unit)
(slice _ 1010
(extend ((?valueA (if (bound ?labelA) ?labelA ?a)))
(conditional
(bgp (triple ?s <http://example.com/propA> ?a))
(bgp (triple ?a <http://example.com/label> ?labelA))))))))
{noformat}
> Poor performance and timeout failure with BIND in nested OPTIONALs
> ------------------------------------------------------------------
>
> Key: JENA-885
> URL: https://issues.apache.org/jira/browse/JENA-885
> Project: Apache Jena
> Issue Type: Bug
> Affects Versions: Jena 2.11.2
> Reporter: Mark Buquor
> Attachments: ExecuteTestQueries.java, GenerateTestDataset.java
>
>
> There appears to be a performance issue with BIND when used inside nested
> OPTIONALs. Affected queries fail to time out.
> The following patterns appear to be affected:
> {noformat}
> OPTIONAL { ... OPTIONAL { ... BIND ( ... ) } }
> OPTIONAL { ... OPTIONAL { ... } BIND ( ... ) }
> {noformat}
> The following patterns appear to be unaffected:
> {noformat}
> OPTIONAL { ... OPTIONAL { ... } } BIND ( ... )
> OPTIONAL { ... BIND ( ... ) }
> OPTIONAL { ... } BIND ( ... )
> {noformat}
> So far, users have been able to work around the performance issue by
> rewriting their queries. However, the timeout failure is still a significant
> reliability issue, since affected queries consume resources and can run
> indefinitely. I've attached a testcase that exhibits the performance and
> timeout problems. Reproduced with a recent 2.13.0-SNAPSHOT build.
> {noformat}
> Execution Timeout (ms): 30000
> Query: PREFIX ex: <http://example.com/> SELECT ?s ?valueA { OPTIONAL { ?s
> ex:propA ?a . OPTIONAL { ?a ex:label ?labelA . } } BIND ( IF ( BOUND
> (?labelA), ?labelA, ?a) as ?valueA) }
> Execution time (ms): 586
> Execution time (ms): 143
> Query: PREFIX ex: <http://example.com/> SELECT ?s ?valueA { OPTIONAL { ?s
> ex:propA ?a . OPTIONAL { ?a ex:label ?labelA . } BIND ( IF ( BOUND (?labelA),
> ?labelA, ?a) as ?valueA) } }
> Execution time (ms): 110922
> Execution time (ms): 41004
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)