[jira] [Commented] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

Andy Seaborne (JIRA) Wed, 04 Jun 2014 02:15:27 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017518#comment-14017518
 ]


Andy Seaborne commented on JENA-709:
------------------------------------

Agreed - it might as well do a loop join because it's a cross product.

If there are common variables, it does avoid using `sequence` 

{noformat}
PREFIX : <http://example/> 

SELECT * {
  ?s :p ?o .
  { SELECT ?o { ?s ?p ?o } ORDER BY ?o LIMIT 5 }
}
{noformat}
{noformat}
prefix ((: <http://example/>))
  (join
    (bgp (triple ?s :p ?o))
    (project (?o)
      (top (5 ?o)
        (bgp (triple ?/s ?/p ?o))))))
{noformat}
so it looks more like it should do this anyway.

Or reverse the join order because:
{noformat}
PREFIX : <http://example/> 

SELECT * {
  { SELECT ?o { ?s ?p ?o } ORDER BY ?o LIMIT 5 }
  ?s :p ?o .
}
{noformat}
{noformat}
(prefix ((: <http://example/>))
  (sequence
    (project (?o)
      (top (5 ?o)
        (bgp (triple ?/s ?/p ?o))))
    (bgp (triple ?s :p ?o))))
{noformat}

> Index join strategy may need to be more conservative when some sequence 
> elements are potentially expensive
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-709
>                 URL: https://issues.apache.org/jira/browse/JENA-709
>             Project: Apache Jena
>          Issue Type: Brainstorming
>          Components: ARQ, Optimizer
>            Reporter: Rob Vesse
>            Priority: Minor
>
> As noted in a discussion of a poorly performing query on a mailing list 
> thread (http://s.apache.org/cAn) there are cases where the introduction of 
> {{sequence}} can actually make the query slower when some elements in the 
> {{sequence}} are expensive to calculate e.g. sub-queries
> The example query given is:
> {noformat}
> SELECT DISTINCT ?O ?T  ?E
> WHERE
> {  
>   ?E a x:E. 
>   {
>     SELECT ?O ?T 
>     WHERE 
>     {
>       ?O :oE ?E ;
>             :oT ?T .
>     } 
>     ORDER BY DESC(?T)
>     LIMIT 3
>   }
> }
> {noformat}
> Which produces the following algebra:
> {noformat}
> (distinct
>  (project (?O ?T ?E)
>   (sequence
>    (bgp (triple ?E rdf:type x:E))
>    (project (?O ?T)
>     (top (3 (desc ?T))
>      (bgp
>       (triple ?O :oE ?/E)
>       (triple ?O :oT ?T)
>      ))))))
> {noformat}
> Because there are no common variables due to scoping the substitution of the 
> bindings from the first sequence element into the sub-query has no effect so 
> the expensive sub-query (note the {{top}} operator) gets executed in full for 
> every single LHS solution
> It is unclear from the discussion thread so far if this is just a badly 
> written query and we don't have an example dataset that demonstrates the 
> performance problems but just looking at the algebra it seems like we would 
> be better avoiding use of {{sequence}} in favour of a plain {{join}} in a 
> case like this



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (JENA-709) Index join strategy may need to be more conservative when some sequence elements are potentially expensive

Reply via email to