[
https://issues.apache.org/jira/browse/JENA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rob Vesse updated JENA-709:
---------------------------
Priority: Minor (was: Major)
> Index join strategy may need to be more conservative when some sequence
> elements are potentially expensive
> ----------------------------------------------------------------------------------------------------------
>
> Key: JENA-709
> URL: https://issues.apache.org/jira/browse/JENA-709
> Project: Apache Jena
> Issue Type: Brainstorming
> Components: ARQ, Optimizer
> Reporter: Rob Vesse
> Priority: Minor
>
> As noted in a discussion of a poorly performing query on a mailing list
> thread (http://s.apache.org/cAn) there are cases where the introduction of
> {{sequence}} can actually make the query slower when some elements in the
> {{sequence}} are expensive to calculate e.g. sub-queries
> The example query given is:
> {noformat}
> SELECT DISTINCT ?O ?T ?E
> WHERE
> {
> ?E a x:E.
> {
> SELECT ?O ?T
> WHERE
> {
> ?O :oE ?E ;
> :oT ?T .
> }
> ORDER BY DESC(?T)
> LIMIT 3
> }
> }
> {noformat}
> Which produces the following algebra:
> {noformat}
> (distinct
> (project (?O ?T ?E)
> (sequence
> (bgp (triple ?E rdf:type x:E))
> (project (?O ?T)
> (top (3 (desc ?T))
> (bgp
> (triple ?O :oE ?/E)
> (triple ?O :oT ?T)
> ))))))
> {noformat}
> Because there are no common variables due to scoping the substitution of the
> bindings from the first sequence element into the sub-query has no effect so
> the expensive sub-query (note the {{top}} operator) gets executed in full for
> every single LHS solution
> It is unclear from the discussion thread so far if this is just a badly
> written query and we don't have an example dataset that demonstrates the
> performance problems but just looking at the algebra it seems like we would
> be better avoiding use of {{sequence}} in favour of a plain {{join}} in a
> case like this
--
This message was sent by Atlassian JIRA
(v6.2#6252)