James Synge wrote:
Yes, though I've been finding it hard to understand the manner in which the
the query plan is transformed by the optimizer. For example, how
an index scan gets inserted above (?) a base table,
The query plan that you see is generated as part of the "modifyAccessPaths"
phase of compilation. *After* the optimizer has found a best access plan for a
query, it then goes through and calls "modifyAccessPaths" on all of the nodes in
the query tree. It is during this phase that the internal nodes (ex.
IndexToBaseRowNode) are generated.
For tracing purposes, take a look at DMLStatementNode.optimize(). In that
method you can see a nice breakdown of the different phases of optimization
(comments here are my own):
// Phase 1: Pre-process (occurs after binding has finished):
resultSet = resultSet.preprocess(getCompilerContext().getNumTables(),
null, (FromList) null);
// Phase 2: Optimize (see below)
resultSet = resultSet.optimize(getDataDictionary(), null, 1.0d);
// Phase 3: Modify access paths, i.e. generate the nodes required
// to execute the access plan chosen in Phase 2.
resultSet = resultSet.modifyAccessPaths();
If you put a break point at the call to modifyAccessPaths() and then step into
it, that might help you understan more about how the final query tree is
generated based on optimizer decisions.
and more generally, what are all the ways that this kind of alternative
approach can be added to the mix and evaluated.
The "alternate approaches" which are evaluated during optimization consist
primarily of three things: a) join order (which table comes first), b) join
strategy (hash or nested loop), and c) index usage (which index, if any, are we
going to use).
For a general description of how the optimizer evaluates and chooses which
"access path" is best, you might find it helpful to read "Section III" of the
following two documents:
DERBY-781_v1.html (attached to DERBY-781)
DERBY-805_v5.html (attached to DERBY-805)
These writeups are admittedly biased toward the specific issues to which they
are attached, but perhaps there's some info there to help you understand how the
different query plans ("approaches") are evaulated on a more general scale...
I'm beginning to see that one alternative access path that we could
generate would introduce a fake resultset that would be used as the
outer table in a nested loop join with the base table (or one of its
indices) as the inner table. What I definitely can't see yet is how
to "slip" that in to the optimizer.
I haven't dug into this approach enough to comment one way or the other, but as
a general knee-jerk reaction I think the place to do this kind of thing would be
during the pre-processing phase. If you are able "transform" the
InListOperatorNode into a meaningful ResultSet during pre-processing, then you
should ideally be able to feed that to the optimizer and let it (the optimizer)
do its normal "magic" to figure out what join strategy to use and whether or not
to use an index. But again, I say that without having thought about your
suggested approach in any detail...
Hopefully that helps get you started, but if you have more questions on how the
overall optimize process works, please do keep asking...
Army