Re: [jira] Commented: (DERBY-47) Some possible improvements to IN optimization

Army Mon, 18 Dec 2006 16:13:38 -0800

James Synge (JIRA) wrote:


The basic question is "when" should I replace the IN predicate?  The two basic
choices would seem to be during preprocessing or during 
getNextDecoratedPermutation.

Very good question. I guess I don't really know what the answer here is. Asyou say, the first way seems like it'd be easier to implement, so maybe that's agood one to start with. At the very least that will allow you to familiarizeyourself with the preprocessing/binding/rewrite code. After that you could thenlook at adding logic to determine when to use the VTI and when to fall back tonormal IN-list processing.

The former would likely be simpler to implement, but would my force solution
to always be used (which, given the poor cost estimates produced due to this
bug, might be a good thing).

Just to be clear, can I confirm that when you say "poor cost estimates" you aretalking about cost estimates like the following, which I see when I do:


  select * from admin.changes where id in (1, 20000)

The cost estimate that I see for this (when running against the db created inyour test program) is:


Project-Restrict ResultSet (2):
Number of opens = 1
Rows seen = 20000
Rows filtered = 19998
restriction = true
projection = false
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count:         1500.38
        optimizer estimated cost:        22099.70

Source result set:
        Table Scan ResultSet for CHANGES a <snip>

Is this similar to what you see? The reason I ask is because there are someother issues (namely, DERBY-1905) where we are seeing *really* bad costestimates--far more ridiculous than the one shown above. So I just want to makesure we're on the same page w.r.t to what "poor" cost estimates means in thecontext of this issue. In the long run maybe that's an irrelevant question, butI would at least like to make sure that I'm seeing the same estimates that you are.

The latter would require replacing the Optimizable against which the IN
predicate is being evaluated with a substitute that represents a nested

join (probably).

Are we replacing the Optimizable or the IN predicate? In your comments forDERBY-2152 you wrote that your proposed approach would transform a query like:


 SELECT * FROM tableA WHERE columnB IN (constC, ?, ?, constD)

into something like:

 SELECT * FROM tableA WHERE columnB IN
  (SELECT vti.column1 FROM new ArgsToRowsVTI(SomeArgs))

So in this case I think we're replacing the IN predicate itself, not the"Optimizable against which the IN predicate is being evaluated" (i.e. tableA)?Is that correct? Sorry if I'm being dense here.

In any event, "switching" between these two transformations as part of a call togetNextDecoratedPermutation() seems like it could be tough. In the optimizerwork that I've done there has always been the implicit assumption that the querybeing optimized is "constant" throughout optimization; i.e. that we're alwaysoptimizing the *same* set of result set nodes--all that changes is the joinorder, index choices, and join strategies.

That doesn't mean it's impossible to try to optimize different "rewrites" of thequery--it just means that I haven't tried it so I don't know what it wouldentail. My gut feeling, though, is that you'd be better off trying to do thequery transformation as part of "preprocessing" as you mentioned above.

I really don't have enough of a grasp of the sub-query optimization to
know whether that is a good idea or not.

For what little it's worth, I can't help but wonder if it would be better torewrite the query as:


 SELECT * FROM
   tableA,
   (SELECT vti.column1 FROM new ArgsToRowsVTI(SomeArgs)) x
 WHERE tableA.columnB = x.column1

This seems like something the optimizer would be better able to optimize. Butthat said, I did a simple example query using a real table instead of the VTIand the optimizer seems to have treated the query similarly for both rewrites.So go with your proposed rewrite and see what happens :)

The next question is how do I pass information to the VTI implementation
regarding which parameters of the activation should it use as the values
in the virtual table?

The InListOperatorNode should have a list of values to which it applies. If youpass that list into the VTI constructor then I think you'll get the relevantvalues and parameters that apply to the IN-list...?


I'm currently looking to add the following if/else branch to InListOperatorNode:

                else if ((leftOperand instanceof ColumnReference) &&
                                 
rightOperandList.containsAllConstantOrParameterNodes())
                {


<snip>

I noticed that there's already the following "else if":

                else if ((leftOperand instanceof ColumnReference) &&
                                 rightOperandList.containsAllConstantNodes())
                {

Is it true that anytime the existing "else-if" condition is true, your new"else-if" condition will be true, as well? I.e a rightOperandList that contains"all constant nodes" would also always contain "all constant or parameternodes", wouldn't it? If so, then where would you add the new "else-if"? If youadd it before the existing one then the existing one will never get executed; ifyou add it after the existing one then your new VTI solution will only go intoeffect if the IN-list has at least one parameter. Is that what you are planning?

I realize this email has more questions than answers, so my apologies for lackof helpful information. Unfortunately, the particular area of the optimizerwith which you are working (preprocessing and/or rewrite) is one that is stillrelatively unexplored for me...


Army

Re: [jira] Commented: (DERBY-47) Some possible improvements to IN optimization

Reply via email to