James Synge (JIRA) wrote:
The basic question is "when" should I replace the IN predicate? The two basic
choices would seem to be during preprocessing or during
getNextDecoratedPermutation.
Very good question. I guess I don't really know what the answer here is. As
you say, the first way seems like it'd be easier to implement, so maybe that's a
good one to start with. At the very least that will allow you to familiarize
yourself with the preprocessing/binding/rewrite code. After that you could then
look at adding logic to determine when to use the VTI and when to fall back to
normal IN-list processing.
The former would likely be simpler to implement, but would my force solution
to always be used (which, given the poor cost estimates produced due to this
bug, might be a good thing).
Just to be clear, can I confirm that when you say "poor cost estimates" you are
talking about cost estimates like the following, which I see when I do:
select * from admin.changes where id in (1, 20000)
The cost estimate that I see for this (when running against the db created in
your test program) is:
Project-Restrict ResultSet (2):
Number of opens = 1
Rows seen = 20000
Rows filtered = 19998
restriction = true
projection = false
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1500.38
optimizer estimated cost: 22099.70
Source result set:
Table Scan ResultSet for CHANGES a <snip>
Is this similar to what you see? The reason I ask is because there are some
other issues (namely, DERBY-1905) where we are seeing *really* bad cost
estimates--far more ridiculous than the one shown above. So I just want to make
sure we're on the same page w.r.t to what "poor" cost estimates means in the
context of this issue. In the long run maybe that's an irrelevant question, but
I would at least like to make sure that I'm seeing the same estimates that you are.
The latter would require replacing the Optimizable against which the IN
predicate is being evaluated with a substitute that represents a nested
join (probably).
Are we replacing the Optimizable or the IN predicate? In your comments for
DERBY-2152 you wrote that your proposed approach would transform a query like:
SELECT * FROM tableA WHERE columnB IN (constC, ?, ?, constD)
into something like:
SELECT * FROM tableA WHERE columnB IN
(SELECT vti.column1 FROM new ArgsToRowsVTI(SomeArgs))
So in this case I think we're replacing the IN predicate itself, not the
"Optimizable against which the IN predicate is being evaluated" (i.e. tableA)?
Is that correct? Sorry if I'm being dense here.
In any event, "switching" between these two transformations as part of a call to
getNextDecoratedPermutation() seems like it could be tough. In the optimizer
work that I've done there has always been the implicit assumption that the query
being optimized is "constant" throughout optimization; i.e. that we're always
optimizing the *same* set of result set nodes--all that changes is the join
order, index choices, and join strategies.
That doesn't mean it's impossible to try to optimize different "rewrites" of the
query--it just means that I haven't tried it so I don't know what it would
entail. My gut feeling, though, is that you'd be better off trying to do the
query transformation as part of "preprocessing" as you mentioned above.
I really don't have enough of a grasp of the sub-query optimization to
know whether that is a good idea or not.
For what little it's worth, I can't help but wonder if it would be better to
rewrite the query as:
SELECT * FROM
tableA,
(SELECT vti.column1 FROM new ArgsToRowsVTI(SomeArgs)) x
WHERE tableA.columnB = x.column1
This seems like something the optimizer would be better able to optimize. But
that said, I did a simple example query using a real table instead of the VTI
and the optimizer seems to have treated the query similarly for both rewrites.
So go with your proposed rewrite and see what happens :)
The next question is how do I pass information to the VTI implementation
regarding which parameters of the activation should it use as the values
in the virtual table?
The InListOperatorNode should have a list of values to which it applies. If you
pass that list into the VTI constructor then I think you'll get the relevant
values and parameters that apply to the IN-list...?
I'm currently looking to add the following if/else branch to InListOperatorNode:
else if ((leftOperand instanceof ColumnReference) &&
rightOperandList.containsAllConstantOrParameterNodes())
{
<snip>
I noticed that there's already the following "else if":
else if ((leftOperand instanceof ColumnReference) &&
rightOperandList.containsAllConstantNodes())
{
Is it true that anytime the existing "else-if" condition is true, your new
"else-if" condition will be true, as well? I.e a rightOperandList that contains
"all constant nodes" would also always contain "all constant or parameter
nodes", wouldn't it? If so, then where would you add the new "else-if"? If you
add it before the existing one then the existing one will never get executed; if
you add it after the existing one then your new VTI solution will only go into
effect if the IN-list has at least one parameter. Is that what you are planning?
I realize this email has more questions than answers, so my apologies for lack
of helpful information. Unfortunately, the particular area of the optimizer
with which you are working (preprocessing and/or rewrite) is one that is still
relatively unexplored for me...
Army