James Synge wrote:

Yes, though I've been finding it hard to understand the manner in which the
the query plan is transformed by the optimizer.  For example, how
an index scan gets inserted above (?) a base table,

The query plan that you see is generated as part of the "modifyAccessPaths" phase of compilation. *After* the optimizer has found a best access plan for a query, it then goes through and calls "modifyAccessPaths" on all of the nodes in the query tree. It is during this phase that the internal nodes (ex. IndexToBaseRowNode) are generated.

For tracing purposes, take a look at DMLStatementNode.optimize(). In that method you can see a nice breakdown of the different phases of optimization (comments here are my own):

// Phase 1: Pre-process (occurs after binding has finished):
resultSet = resultSet.preprocess(getCompilerContext().getNumTables(),
                null, (FromList) null);

// Phase 2: Optimize (see below)
resultSet = resultSet.optimize(getDataDictionary(), null, 1.0d);

// Phase 3: Modify access paths, i.e. generate the nodes required
// to execute the access plan chosen in Phase 2.
resultSet = resultSet.modifyAccessPaths();

If you put a break point at the call to modifyAccessPaths() and then step into it, that might help you understan more about how the final query tree is generated based on optimizer decisions.

and more generally, what are all the ways that this kind of alternative approach can be added to the mix and evaluated.

The "alternate approaches" which are evaluated during optimization consist primarily of three things: a) join order (which table comes first), b) join strategy (hash or nested loop), and c) index usage (which index, if any, are we going to use).

For a general description of how the optimizer evaluates and chooses which "access path" is best, you might find it helpful to read "Section III" of the following two documents:

DERBY-781_v1.html (attached to DERBY-781)
DERBY-805_v5.html (attached to DERBY-805)

These writeups are admittedly biased toward the specific issues to which they are attached, but perhaps there's some info there to help you understand how the different query plans ("approaches") are evaulated on a more general scale...

I'm beginning to see that one alternative access path that we could generate would introduce a fake resultset that would be used as the outer table in a nested loop join with the base table (or one of its indices) as the inner table. What I definitely can't see yet is how to "slip" that in to the optimizer.

I haven't dug into this approach enough to comment one way or the other, but as a general knee-jerk reaction I think the place to do this kind of thing would be during the pre-processing phase. If you are able "transform" the InListOperatorNode into a meaningful ResultSet during pre-processing, then you should ideally be able to feed that to the optimizer and let it (the optimizer) do its normal "magic" to figure out what join strategy to use and whether or not to use an index. But again, I say that without having thought about your suggested approach in any detail...

Hopefully that helps get you started, but if you have more questions on how the overall optimize process works, please do keep asking...

Army

Reply via email to