[ http://issues.apache.org/jira/browse/DERBY-805?page=all ]
A B updated DERBY-805:
----------------------
Attachment: d805_phase4_v2.patch
DERBY-805_v5.html
[ Attaching d805_phase4_v2.patch, which only has a 1-line diff from the v1
patch ]
After stepping away from the code for the weekend and then re-reading the
document for Phase 4, I noticed a small (one-line) error in the first version
of my Phase 4 changes.
In the document I correctly wrote the following regarding the need to revert
access paths for subtrees when an Optimizable's current permutation is found to
be "not the best":
<begin_quote>
In order to fix this behavior, we have to add logic to save the "best plans"
for an Optimizable's subtree before each new permutation, and then if the
current permutation isn't the best one, we need to revert the entire subtree's
plans back to what they were for the "best" permutation.
<end_quote>
But the code that I wrote didn't quite match this statement. In the following
diff:
+ // If the final path that we considered for curOpt was _not_ the best
+ // path for this round, then we need to revert back to whatever the
+ // best plan for curOpt was this round. Note that the cost estimate
+ // for bestAccessPath could be null here if the last path that we
+ // checked was the only one possible for this round.
+ if (!retval &&
+ (curOpt.getBestAccessPath().getCostEstimate() != null) &&
+ (curOpt.getCurrentAccessPath().getCostEstimate() != null))
+ {
the check for "!retval" means that we will only revert the plans if we have
exhausted all permutations for the current Optimizable. But as I wrote in the
document, we need to check (and potentially revert) the subtree's plans after
_each_ new permutation, not just after the final one.
To show why this matters, assume we have some Optimizable with three possible
permutations P1, P2, and P3, of which P1 is the "best". With the code as I
wrote it in d805_phase4_v1.patch, we would end up doing the following:
-- Pick permutation P1.
-- Since P1 is the first permutation, there is no previous "best" path
so there's nothing to save.
-- Optimize P1 and get the cost; say it's 25.
-- Since P1 isn't our final permutation, we don't revert. So the plans
corresponding to P1 are still saved at the Optimizable and its subtree.
-- Pick permutation P2.
-- Before optimizing P2, save the best paths for the current Optimizable's
subtree; this means we'll save off the paths corresponding to P1.
-- Optimize P2 and get the cost; say it's 50.
-- Since P2 isn't our final permutation, we *don't* revert. So the plans
corresponding to P2 are still saved at the Optimizable and its subtree.
-- Pick permutation P3.
-- Before optimizing P3, save the best paths for the current Optimizable's
subtree; this means we'll save off the paths corresponding to *P2*.
(This is wrong.)
-- Optimize P3 and get the cost; say it's 75.
-- Since P3 is our final permutation, and since it is NOT the best permutation
that we found, we'll now revert the paths of this Optimizable's subtree.
That mean's we'll load the plans that we most recently saved and
generate the query plan based on those. But that will give us the
plans for _P2_, when what we wanted was the plans for _P1_.
So d805_phase4_v2.patch fixes this by removing the check for "!retval" in the
above diff. Then after we optimize P2 and get the cost, we'll see that it's
not the best so we'll immediately revert the paths back to what they were for
P1. The P1 paths (instead of the P2 paths) will then get saved off before
optimization of P3 starts, and thus when we do the "revert" after P3, we'll
load the correct paths (P1).
As it turns out this particular change doesn't change any of the other Phase 4
behavior. The reason is that an Optimizable that is not a FromBaseTable only
(currently) has two permutations--nested loop join and hash join--and thus the
above scenario can't currently happen. FromBaseTable's can have more than two
permutations, but since they don't have subtrees beneath them this whole
save/revert logic is not required. Thus even though the v1 patch was
technically incorrect, everything ran as expected. For accuracy, though, I
think this small error should be fixed, so that's what the second version of
the phase 4 patch (d805_phase4_v2.patch) does.
I've also added a _v5.html document that is almost identical to _v4 except for
two things: 1) the small change that I just described, and 2) I've included a
note with the info about index statistics that Andrew posted to derby-dev
(thanks to Bryan for asking the question and to Andrew for providing the
answer).
Just to be safe I re-ran derbyall on Red Hat Linux with ibm142 sane jars and
saw no failures.
> Push join predicates into union and other set operations. DERBY-649
> implemented scalar (single table) predicate pushdown. Adding join predicate
> push down could improve performance significantly.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-805
> URL: http://issues.apache.org/jira/browse/DERBY-805
> Project: Derby
> Type: Sub-task
> Components: SQL
> Versions: 10.1.2.0, 10.2.0.0
> Environment: generic
> Reporter: Satheesh Bandaram
> Assignee: A B
> Fix For: 10.2.0.0
> Attachments: DERBY-805.html, DERBY-805_v2.html, DERBY-805_v3.html,
> DERBY-805_v4.html, DERBY-805_v5.html, d805_phase1_v1.patch,
> d805_phase1_v1.stat, d805_phase1_v2.patch, d805_phase1_v2.stat,
> d805_phase1_v3.patch, d805_phase1_v3.stat, d805_phase2_v1.patch,
> d805_phase2_v1.stat, d805_phase3_v1.patch, d805_phase3_v1.stat,
> d805_phase4_v1.patch, d805_phase4_v1.stat, d805_phase4_v2.patch,
> phase2_javadocFix.patch, predPushdown_testFix.patch
>
> Fix for DERBY-649 implemented scalar (single table) predicate push down into
> UNIONs. While this improves performance for one set of queries, ability to
> push join-predicates further improves Derby performance by enabling use of
> indices where possible.
> For example,
> create view V1 as select i, j from T1 union all select i,j from T2;
> create view V2 as select a,b from T3 union all select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like
> select * from V1, V2 where V1.j = V2.b and V1.i =1;
> If the join order choosen is V1,V2, V1 can use index on V1.i (if present)
> following fix for DERBY-649. But if there is a index on V2.b also, Derby
> currently can't use that index. By pushing join predicate, Derby would be
> able to use the index and improve performance. Some of the queries I have
> seen (not the one shown here...) could improve from 70-120 seconds to about
> one second.
> Note there is a good comment by Jeff Lichtman about join-predicate push down.
> I am copying parts of it here for completeness of this report: (Modified)
> If predicate push down is done during optimization, it would be possible to
> push joins into the union as long as it's in the right place in the join
> order.
> For example:
> create view v as select * from t1 union all select * from t2;
> select * from v, t3 where v.c1 = t3.c2;
> In this select, if t3 is the outer table then the qualification could be
> pushed into the union and optimized there, but if t3 is the inner table the
> qualification can't be pushed into the union.
> If the pushing is done at preprocess time (i.e. before optimization) it is
> impossible to know whether a join qualification like this can be safely
> pushed.
> There's a comment in UnionNode.optimizeIt() saying:
> /* RESOLVE - don't try to push predicated through for now */
> This is where I'd expect to see something for pushing predicates into the
> union during optimization.
> BTW, the business of pushing and pulling predicates during optimization can
> be hard to understand and debug, so maybe it's best to only handle the simple
> cases and do it during preprocessing.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira