[ 
https://issues.apache.org/jira/browse/JENA-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-653:
-------------------------------

    Description: 
The filter placement method of pushing into both arms only works when the 
filter is directly over the union. If the filter is further out, and for filter 
expressions that do not involve variables in the union arms, it should be left 
outside as it may be applied elsewhere later.

This is shown in sequence where the union is before some pattern that does bind 
the variable.

Example - the key feature is that the {{union}} is first and is joined to a BGP 
with the union on the LHS and BGP on the RHS. If the join order is reversed, 
then a reasonable and corect optimization is performed.
{noformat}
PREFIX  ex:   <http://ex.org/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT  *
WHERE
  {   { ?item rdf:type ex:type_a }
    UNION
      { ?item rdf:type ex:type_b }
    ?item ex:label ?label
    FILTER ( str(?label) = "a" )
  }
{noformat}
Algebra, after join strategy, before filter placement. The joion style is a 
{{sequence}}:
{noformat}
(prefix ((ex: <http://ex.org/>)
         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
  (filter (= (str ?label) "a")
    (sequence
      (union
        (bgp (triple ?item rdf:type ex:type_a))
        (bgp (triple ?item rdf:type ex:type_b)))
      (bgp (triple ?item ex:label ?label)))))
{noformat}
which is optimzed (wrongly) as:
{noformat}
(prefix ((ex: <http://ex.org/>)
         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
  (sequence
    (union
      (filter (= (str ?label) "a")
        (bgp (triple ?item rdf:type ex:type_a)))
      (filter (= (str ?label) "a")
        (bgp (triple ?item rdf:type ex:type_b))))
    (bgp (triple ?item ex:label ?label))))
{noformat}
The  {{(filter (= (str ?label) "a")}} is applied on the {{union}}, not the 
later {{(bgp (triple ?item ex:label ?label))))}}.

The problem is in the relationship of {{sequence}} and {{union}}. The {{union}} 
can't be treated isolation with the current design.  Either the {{union}} needs 
a better placement calculated, or failing that (less preferrable), a flag to 
change the way filters are pushed own in union depending on nesting context.


  was:
The filter placement method of pushing into both arms only works when the 
filter is directly over the union. If the filter is further out, and for filter 
expressions that do not involve variables in the union arms, it should be left 
outside as it may be applied elsewhere later.

This is shown in sequence where the union is before some pattern that does bind 
the variable.

Example - the key feature is that the {{union}} is first and is joined to a BGP 
with the union on the LHS and BGP on the RHS.
{noformat}
PREFIX  ex:   <http://ex.org/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT  *
WHERE
  {   { ?item rdf:type ex:type_a }
    UNION
      { ?item rdf:type ex:type_b }
    ?item ex:label ?label
    FILTER ( str(?label) = "a" )
  }
{noformat}
Algebra, after join strategy, before filter placement. The joion style is a 
{{sequence}}:
{noformat}
(prefix ((ex: <http://ex.org/>)
         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
  (filter (= (str ?label) "a")
    (sequence
      (union
        (bgp (triple ?item rdf:type ex:type_a))
        (bgp (triple ?item rdf:type ex:type_b)))
      (bgp (triple ?item ex:label ?label)))))
{noformat}
which is optimzed (wrongly) as:
{noformat}
(prefix ((ex: <http://ex.org/>)
         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
  (sequence
    (union
      (filter (= (str ?label) "a")
        (bgp (triple ?item rdf:type ex:type_a)))
      (filter (= (str ?label) "a")
        (bgp (triple ?item rdf:type ex:type_b))))
    (bgp (triple ?item ex:label ?label))))
{noformat}
The  {{(filter (= (str ?label) "a")}} is applied on the {{union}}, not the 
later {{(bgp (triple ?item ex:label ?label))))}}.

The problem is in the relationship of {{sequence}} and {{union}}. The {{union}} 
can't be treated isolation with the current design.  Either the {{union}} needs 
a better placement calculated, or failing that (less preferrable), a flag to 
change the way filters are pushed own in union depending on nesting context.



> Filter Placement into union pushes down whole filter but this fails in a 
> sequence.
> ----------------------------------------------------------------------------------
>
>                 Key: JENA-653
>                 URL: https://issues.apache.org/jira/browse/JENA-653
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.11.1
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>
> The filter placement method of pushing into both arms only works when the 
> filter is directly over the union. If the filter is further out, and for 
> filter expressions that do not involve variables in the union arms, it should 
> be left outside as it may be applied elsewhere later.
> This is shown in sequence where the union is before some pattern that does 
> bind the variable.
> Example - the key feature is that the {{union}} is first and is joined to a 
> BGP with the union on the LHS and BGP on the RHS. If the join order is 
> reversed, then a reasonable and corect optimization is performed.
> {noformat}
> PREFIX  ex:   <http://ex.org/>
> PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> SELECT  *
> WHERE
>   {   { ?item rdf:type ex:type_a }
>     UNION
>       { ?item rdf:type ex:type_b }
>     ?item ex:label ?label
>     FILTER ( str(?label) = "a" )
>   }
> {noformat}
> Algebra, after join strategy, before filter placement. The joion style is a 
> {{sequence}}:
> {noformat}
> (prefix ((ex: <http://ex.org/>)
>          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>   (filter (= (str ?label) "a")
>     (sequence
>       (union
>         (bgp (triple ?item rdf:type ex:type_a))
>         (bgp (triple ?item rdf:type ex:type_b)))
>       (bgp (triple ?item ex:label ?label)))))
> {noformat}
> which is optimzed (wrongly) as:
> {noformat}
> (prefix ((ex: <http://ex.org/>)
>          (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
>   (sequence
>     (union
>       (filter (= (str ?label) "a")
>         (bgp (triple ?item rdf:type ex:type_a)))
>       (filter (= (str ?label) "a")
>         (bgp (triple ?item rdf:type ex:type_b))))
>     (bgp (triple ?item ex:label ?label))))
> {noformat}
> The  {{(filter (= (str ?label) "a")}} is applied on the {{union}}, not the 
> later {{(bgp (triple ?item ex:label ?label))))}}.
> The problem is in the relationship of {{sequence}} and {{union}}. The 
> {{union}} can't be treated isolation with the current design.  Either the 
> {{union}} needs a better placement calculated, or failing that (less 
> preferrable), a flag to change the way filters are pushed own in union 
> depending on nesting context.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to