Re: Diagnosing DERBY-47 and DERBY-713 (queries with IN (?,?) filters)

Army Mon, 02 Oct 2006 13:55:18 -0700

James Synge wrote:

Yes, though I've been finding it hard to understand the manner in which the
the query plan is transformed by the optimizer.  For example, how

an index scan gets inserted above (?) a base table,

The query plan that you see is generated as part of the "modifyAccessPaths"phase of compilation. *After* the optimizer has found a best access plan for aquery, it then goes through and calls "modifyAccessPaths" on all of the nodes inthe query tree. It is during this phase that the internal nodes (ex.IndexToBaseRowNode) are generated.

For tracing purposes, take a look at DMLStatementNode.optimize(). In thatmethod you can see a nice breakdown of the different phases of optimization(comments here are my own):


// Phase 1: Pre-process (occurs after binding has finished):
resultSet = resultSet.preprocess(getCompilerContext().getNumTables(),
                null, (FromList) null);

// Phase 2: Optimize (see below)
resultSet = resultSet.optimize(getDataDictionary(), null, 1.0d);

// Phase 3: Modify access paths, i.e. generate the nodes required
// to execute the access plan chosen in Phase 2.
resultSet = resultSet.modifyAccessPaths();

If you put a break point at the call to modifyAccessPaths() and then step intoit, that might help you understan more about how the final query tree isgenerated based on optimizer decisions.

and more generally, what are all the ways that this kind of alternativeapproach can be added to the mix and evaluated.

The "alternate approaches" which are evaluated during optimization consistprimarily of three things: a) join order (which table comes first), b) joinstrategy (hash or nested loop), and c) index usage (which index, if any, are wegoing to use).

For a general description of how the optimizer evaluates and chooses which"access path" is best, you might find it helpful to read "Section III" of thefollowing two documents:


DERBY-781_v1.html (attached to DERBY-781)
DERBY-805_v5.html (attached to DERBY-805)

These writeups are admittedly biased toward the specific issues to which theyare attached, but perhaps there's some info there to help you understand how thedifferent query plans ("approaches") are evaulated on a more general scale...

I'm beginning to see that one alternative access path that we couldgenerate would introduce a fake resultset that would be used as theouter table in a nested loop join with the base table (or one of itsindices) as the inner table. What I definitely can't see yet is howto "slip" that in to the optimizer.

I haven't dug into this approach enough to comment one way or the other, but asa general knee-jerk reaction I think the place to do this kind of thing would beduring the pre-processing phase. If you are able "transform" theInListOperatorNode into a meaningful ResultSet during pre-processing, then youshould ideally be able to feed that to the optimizer and let it (the optimizer)do its normal "magic" to figure out what join strategy to use and whether or notto use an index. But again, I say that without having thought about yoursuggested approach in any detail...

Hopefully that helps get you started, but if you have more questions on how theoverall optimize process works, please do keep asking...


Army

Re: Diagnosing DERBY-47 and DERBY-713 (queries with IN (?,?) filters)

Reply via email to