[jira] [Updated] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

Rob Vesse (JIRA) Wed, 04 Jun 2014 02:07:33 -0700

     [ 
https://issues.apache.org/jira/browse/JENA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rob Vesse updated JENA-709:
---------------------------

    Priority: Minor  (was: Major)

> Index join strategy may need to be more conservative when some sequence 
> elements are potentially expensive
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-709
>                 URL: https://issues.apache.org/jira/browse/JENA-709
>             Project: Apache Jena
>          Issue Type: Brainstorming
>          Components: ARQ, Optimizer
>            Reporter: Rob Vesse
>            Priority: Minor
>
> As noted in a discussion of a poorly performing query on a mailing list 
> thread (http://s.apache.org/cAn) there are cases where the introduction of 
> {{sequence}} can actually make the query slower when some elements in the 
> {{sequence}} are expensive to calculate e.g. sub-queries
> The example query given is:
> {noformat}
> SELECT DISTINCT ?O ?T  ?E
> WHERE
> {  
>   ?E a x:E. 
>   {
>     SELECT ?O ?T 
>     WHERE 
>     {
>       ?O :oE ?E ;
>             :oT ?T .
>     } 
>     ORDER BY DESC(?T)
>     LIMIT 3
>   }
> }
> {noformat}
> Which produces the following algebra:
> {noformat}
> (distinct
>  (project (?O ?T ?E)
>   (sequence
>    (bgp (triple ?E rdf:type x:E))
>    (project (?O ?T)
>     (top (3 (desc ?T))
>      (bgp
>       (triple ?O :oE ?/E)
>       (triple ?O :oT ?T)
>      ))))))
> {noformat}
> Because there are no common variables due to scoping the substitution of the 
> bindings from the first sequence element into the sub-query has no effect so 
> the expensive sub-query (note the {{top}} operator) gets executed in full for 
> every single LHS solution
> It is unclear from the discussion thread so far if this is just a badly 
> written query and we don't have an example dataset that demonstrates the 
> performance problems but just looking at the algebra it seems like we would 
> be better avoiding use of {{sequence}} in favour of a plain {{join}} in a 
> case like this



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

Reply via email to