[
https://issues.apache.org/jira/browse/JENA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017518#comment-14017518
]
Andy Seaborne commented on JENA-709:
------------------------------------
Agreed - it might as well do a loop join because it's a cross product.
If there are common variables, it does avoid using `sequence`
{noformat}
PREFIX : <http://example/>
SELECT * {
?s :p ?o .
{ SELECT ?o { ?s ?p ?o } ORDER BY ?o LIMIT 5 }
}
{noformat}
{noformat}
prefix ((: <http://example/>))
(join
(bgp (triple ?s :p ?o))
(project (?o)
(top (5 ?o)
(bgp (triple ?/s ?/p ?o))))))
{noformat}
so it looks more like it should do this anyway.
Or reverse the join order because:
{noformat}
PREFIX : <http://example/>
SELECT * {
{ SELECT ?o { ?s ?p ?o } ORDER BY ?o LIMIT 5 }
?s :p ?o .
}
{noformat}
{noformat}
(prefix ((: <http://example/>))
(sequence
(project (?o)
(top (5 ?o)
(bgp (triple ?/s ?/p ?o))))
(bgp (triple ?s :p ?o))))
{noformat}
> Index join strategy may need to be more conservative when some sequence
> elements are potentially expensive
> ----------------------------------------------------------------------------------------------------------
>
> Key: JENA-709
> URL: https://issues.apache.org/jira/browse/JENA-709
> Project: Apache Jena
> Issue Type: Brainstorming
> Components: ARQ, Optimizer
> Reporter: Rob Vesse
> Priority: Minor
>
> As noted in a discussion of a poorly performing query on a mailing list
> thread (http://s.apache.org/cAn) there are cases where the introduction of
> {{sequence}} can actually make the query slower when some elements in the
> {{sequence}} are expensive to calculate e.g. sub-queries
> The example query given is:
> {noformat}
> SELECT DISTINCT ?O ?T ?E
> WHERE
> {
> ?E a x:E.
> {
> SELECT ?O ?T
> WHERE
> {
> ?O :oE ?E ;
> :oT ?T .
> }
> ORDER BY DESC(?T)
> LIMIT 3
> }
> }
> {noformat}
> Which produces the following algebra:
> {noformat}
> (distinct
> (project (?O ?T ?E)
> (sequence
> (bgp (triple ?E rdf:type x:E))
> (project (?O ?T)
> (top (3 (desc ?T))
> (bgp
> (triple ?O :oE ?/E)
> (triple ?O :oT ?T)
> ))))))
> {noformat}
> Because there are no common variables due to scoping the substitution of the
> bindings from the first sequence element into the sub-query has no effect so
> the expensive sub-query (note the {{top}} operator) gets executed in full for
> every single LHS solution
> It is unclear from the discussion thread so far if this is just a badly
> written query and we don't have an example dataset that demonstrates the
> performance problems but just looking at the algebra it seems like we would
> be better avoiding use of {{sequence}} in favour of a plain {{join}} in a
> case like this
--
This message was sent by Atlassian JIRA
(v6.2#6252)