[jira] [Commented] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

Rob Vesse (JIRA) Wed, 04 Jun 2014 07:24:22 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017716#comment-14017716
 ]


Rob Vesse commented on JENA-709:
--------------------------------

This may be a non-issue, when I replied to the email thread and posted this 
issue I was using sparql.org to generate the algebra which is using ARQ 2.11.1 
which I assume the email poster is also using.

However when I use the latest trunk i.e. the 2.11.2 release we are currently 
voting on I get a {{join}} instead of a {{sequence}} so possibly a bug that got 
fixed by accident/design in the course of the last few months though I don't 
know which bug would have caused the change?

Maybe fixed by JENA-705 ??

If this is the case then this is probably a candidate for closing as Not a 
Problem

> Index join strategy may need to be more conservative when some sequence 
> elements are potentially expensive
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-709
>                 URL: https://issues.apache.org/jira/browse/JENA-709
>             Project: Apache Jena
>          Issue Type: Brainstorming
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.11.1
>            Reporter: Rob Vesse
>            Priority: Minor
>
> As noted in a discussion of a poorly performing query on a mailing list 
> thread (http://s.apache.org/cAn) there are cases where the introduction of 
> {{sequence}} can actually make the query slower when some elements in the 
> {{sequence}} are expensive to calculate e.g. sub-queries
> The example query given is:
> {noformat}
> SELECT DISTINCT ?O ?T  ?E
> WHERE
> {  
>   ?E a x:E. 
>   {
>     SELECT ?O ?T 
>     WHERE 
>     {
>       ?O :oE ?E ;
>             :oT ?T .
>     } 
>     ORDER BY DESC(?T)
>     LIMIT 3
>   }
> }
> {noformat}
> Which produces the following algebra:
> {noformat}
> (distinct
>  (project (?O ?T ?E)
>   (sequence
>    (bgp (triple ?E rdf:type x:E))
>    (project (?O ?T)
>     (top (3 (desc ?T))
>      (bgp
>       (triple ?O :oE ?/E)
>       (triple ?O :oT ?T)
>      ))))))
> {noformat}
> Because there are no common variables due to scoping the substitution of the 
> bindings from the first sequence element into the sub-query has no effect so 
> the expensive sub-query (note the {{top}} operator) gets executed in full for 
> every single LHS solution
> It is unclear from the discussion thread so far if this is just a badly 
> written query and we don't have an example dataset that demonstrates the 
> performance problems but just looking at the algebra it seems like we would 
> be better avoiding use of {{sequence}} in favour of a plain {{join}} in a 
> case like this



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

Reply via email to