Rob Vesse created JENA-709:
------------------------------
Summary: Index join strategy may need to be more conservative when
some sequence elements are potentially expensive
Key: JENA-709
URL: https://issues.apache.org/jira/browse/JENA-709
Project: Apache Jena
Issue Type: Brainstorming
Components: ARQ, Optimizer
Reporter: Rob Vesse
As noted in a discussion of a poorly performing query on a mailing list thread
there are cases where the introduction of {{sequence}} can actually make the
query slower when some elements in the {{sequence}} are expensive to calculate
e.g. sub-queries
The example query given is:
{noformat}
SELECT DISTINCT ?O ?T ?E
WHERE
{
?E a x:E.
{
SELECT ?O ?T
WHERE
{
?O :oE ?E ;
:oT ?T .
}
ORDER BY DESC(?T)
LIMIT 3
}
}
{noformat}
Which produces the following algebra:
{noformat}
(distinct
(project (?O ?T ?E)
(sequence
(bgp (triple ?E rdf:type x:E))
(project (?O ?T)
(top (3 (desc ?T))
(bgp
(triple ?O :oE ?/E)
(triple ?O :oT ?T)
))))))
{noformat}
Because there are no common variables due to scoping the substitution of the
bindings from the first sequence element into the sub-query has no effect so
the expensive sub-query (note the {{top}} operator) gets executed in full for
every single LHS solution
It is unclear from the discussion thread so far if this is just a badly written
query and we don't have an example dataset that demonstrates the performance
problems but just looking at the algebra it seems like we would be better
avoiding use of {{sequence}} in favour of a plain {{join}} in a case like this
--
This message was sent by Atlassian JIRA
(v6.2#6252)