[jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.

A B (JIRA) Mon, 10 Apr 2006 14:25:32 -0700

     [ http://issues.apache.org/jira/browse/DERBY-805?page=all ]


A B updated DERBY-805:
----------------------

    Attachment: d805_phase4_v2.patch
                DERBY-805_v5.html

[ Attaching d805_phase4_v2.patch, which only has a 1-line diff from the v1 
patch ]

After stepping away from the code for the weekend and then re-reading the 
document for Phase 4, I noticed a small (one-line) error in the first version 
of my Phase 4 changes.

In the document I correctly wrote the following regarding the need to revert 
access paths for subtrees when an Optimizable's current permutation is found to 
be "not the best":

<begin_quote>

In order to fix this behavior, we have to add logic to save the "best plans" 
for an Optimizable's subtree before each new permutation, and then if the 
current permutation isn't the best one, we need to revert the entire subtree's 
plans back to what they were for the "best" permutation. 

<end_quote>

But the code that I wrote didn't quite match this statement.  In the following 
diff:

+        // If the final path that we considered for curOpt was _not_ the best
+        // path for this round, then we need to revert back to whatever the
+        // best plan for curOpt was this round.  Note that the cost estimate
+        // for bestAccessPath could be null here if the last path that we
+        // checked was the only one possible for this round.
+        if (!retval &&
+            (curOpt.getBestAccessPath().getCostEstimate() != null) &&
+            (curOpt.getCurrentAccessPath().getCostEstimate() != null))
+        {

the check for "!retval" means that we will only revert the plans if we have 
exhausted all permutations for the current Optimizable.  But as I wrote in the 
document, we need to check (and potentially revert) the subtree's plans after 
_each_  new permutation, not just after the final one.

To show why this matters, assume we have some Optimizable with three possible 
permutations P1, P2, and P3, of which P1 is the "best".  With the code as I 
wrote it in d805_phase4_v1.patch, we would end up doing the following:

  -- Pick permutation P1.
  -- Since P1 is the first permutation, there is no previous "best" path
     so there's nothing to save.
  -- Optimize P1 and get the cost; say it's 25.
  -- Since P1 isn't our final permutation, we don't revert.  So the plans
     corresponding to P1 are still saved at the Optimizable and its subtree.
  -- Pick permutation P2.
  -- Before optimizing P2, save the best paths for the current Optimizable's
     subtree; this means we'll save off the paths corresponding to P1.
  -- Optimize P2 and get the cost; say it's 50.
  -- Since P2 isn't our final permutation, we *don't* revert. So the plans
     corresponding to P2 are still saved at the Optimizable and its subtree.
  -- Pick permutation P3.
  -- Before optimizing P3, save the best paths for the current Optimizable's
     subtree; this means we'll save off the paths corresponding to *P2*.
     (This is wrong.)
  -- Optimize P3 and get the cost; say it's 75.
  -- Since P3 is our final permutation, and since it is NOT the best permutation
     that we found, we'll now revert the paths of this Optimizable's subtree.
     That mean's we'll load the plans that we most recently saved and
     generate the query plan based on those.  But that will give us the
     plans for _P2_, when what we wanted was the plans for _P1_.

So d805_phase4_v2.patch fixes this by removing the check for "!retval" in the 
above diff.  Then after we optimize P2 and get the cost, we'll see that it's 
not the best so we'll immediately revert the paths back to what they were for 
P1.  The P1 paths (instead of the P2 paths) will then get saved off before 
optimization of P3 starts, and thus when we do the "revert" after P3, we'll 
load the correct paths (P1).

As it turns out this particular change doesn't change any of the other Phase 4 
behavior.  The reason is that an Optimizable that is not a FromBaseTable only 
(currently) has two permutations--nested loop join and hash join--and thus the 
above scenario can't currently happen. FromBaseTable's can have more than two 
permutations, but since they don't have subtrees beneath them this whole 
save/revert logic is not required.  Thus even though the v1 patch was 
technically incorrect, everything ran as expected.  For accuracy, though, I 
think this small error should be fixed, so that's what the second version of 
the phase 4 patch (d805_phase4_v2.patch) does.

I've also added a _v5.html document that is almost identical to _v4 except for 
two things: 1) the small change that I just described, and 2) I've included a 
note with the info about index statistics that Andrew posted to derby-dev 
(thanks to Bryan for asking the question and to Andrew for providing the 
answer).

Just to be safe I re-ran derbyall on Red Hat Linux with ibm142 sane jars and 
saw no failures.


> Push join predicates into union and other set operations. DERBY-649 
> implemented scalar (single table) predicate pushdown. Adding join predicate 
> push down could improve performance significantly.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-805
>          URL: http://issues.apache.org/jira/browse/DERBY-805
>      Project: Derby
>         Type: Sub-task

>   Components: SQL
>     Versions: 10.1.2.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>      Fix For: 10.2.0.0
>  Attachments: DERBY-805.html, DERBY-805_v2.html, DERBY-805_v3.html, 
> DERBY-805_v4.html, DERBY-805_v5.html, d805_phase1_v1.patch, 
> d805_phase1_v1.stat, d805_phase1_v2.patch, d805_phase1_v2.stat, 
> d805_phase1_v3.patch, d805_phase1_v3.stat, d805_phase2_v1.patch, 
> d805_phase2_v1.stat, d805_phase3_v1.patch, d805_phase3_v1.stat, 
> d805_phase4_v1.patch, d805_phase4_v1.stat, d805_phase4_v2.patch, 
> phase2_javadocFix.patch, predPushdown_testFix.patch
>
> Fix for DERBY-649 implemented scalar (single table) predicate push down into 
> UNIONs. While this improves performance for one set of queries, ability to 
> push join-predicates further improves Derby performance by enabling use of 
> indices where possible.
> For example,
> create view V1 as select i, j from T1 union all select i,j from T2; 
> create view V2 as select a,b from T3 union all select a,b from T4; 
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5); 
> For a query like
> select * from V1, V2 where V1.j = V2.b and V1.i =1;
> If the join order choosen is V1,V2, V1 can use index on V1.i (if present) 
> following fix for DERBY-649. But if there is a index on V2.b also, Derby 
> currently can't use that index. By pushing join predicate, Derby would be 
> able to use the index and improve performance. Some of the queries I have 
> seen (not the one shown here...) could improve from 70-120 seconds to about 
> one second.
> Note there is a good comment by Jeff Lichtman about join-predicate push down. 
> I am copying parts of it here for completeness of this report: (Modified)
> If predicate push down is done during optimization, it would be possible to 
> push joins into the union as long as it's in the right place in the join 
> order.
> For example:
> create view v as select * from t1 union all select * from t2;
> select * from v, t3 where v.c1 = t3.c2;
> In this select, if t3 is the outer table then the qualification could be 
> pushed into the union and optimized there, but if t3 is the inner table the 
> qualification can't be pushed into the union.
> If the pushing is done at preprocess time (i.e. before optimization) it is 
> impossible to know whether a join qualification like this can be safely 
> pushed.
> There's a comment in UnionNode.optimizeIt() saying:
> /* RESOLVE - don't try to push predicated through for now */
> This is where I'd expect to see something for pushing predicates into the 
> union during optimization.
> BTW, the business of pushing and pulling predicates during optimization can 
> be hard to understand and debug, so maybe it's best to only handle the simple 
> cases and do it during preprocessing.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.

Reply via email to