Hiya Rick,
Thanks for you helpful reply! I spent some time over the weekend
snooping around (mostly) InListOperatorNode and OptimizerImpl, but
never found out _where_ the table scan was chosen over an index scan
for finding the IN matches...I really wanted to find it before begging
for someone to point out the obvious :-) Specifically, I got all
tangled up around the dynamically generated classes which implement
the chosen query plan. Also, I didn't ever find any QueryPlan-like
class, but I guess this what the dynamic class(es) implement.
What did I want to do once I got there? Pseudo-code says it best:
// ISTR that the bound IN values are ordered and unique. If not,
// this should be done...
if( column is indexed ){
if( index is unique ){
// this should be a guaranteed win. We know we'll only get
// one match per IN value
do index scan
}
else if( table size > SOME_MAGIC_SIZE ){
// SOME_MAGIC_SIZE would represent the size of (the) table
// at which using the index would be reasonably known to
// be more expensive than scanning the table.
do index scan
}
else {
do table scan
}
}
else {
do table scan
}
Is that a workable approach (excepting the SOME_MAGIC_SIZE bit)?
I'll probably have some time later this week to immerse myself in locating
the magic location to apply this logic. A _little_ pointer would be
helpful: maybe a brief overview of how a query plan is determined -- I
suspect his is somewhere around where trulyTheBestTableAccess (or whatever
it's really called :-)) is defined. I'd rather have to bang my head
against this wall a bit, to get a better understanding of things, instead
of being pointed immediately to the solution -- my own version of smelling
the roses, I guess ;-)
Cheers!
Brent
[2006-06-19 11:30] Rick Hillegas said:
| Hi Brent,
|
| Sounds like you're off to a good start. From the initial bug report, it
| looks like there's a good idea about which heuristic is being
| mis-applied here. Once you've studied the optimizer papers, I recommend
| that you post some high-level candidate solutions. Try to avoid
| optimizer jargon and concentrate on simple descriptions:
|
| o What query plan would you rather see?
| o What heuristic would the optimizer apply that would lead it to your
| preferred plan?
| o How would the optimizer decide to apply the new heuristic rather than
| the old one?
|
| I think you'll get some good feedback if your post contains the phrase
| "Attention, optimizer experts." I think that the optimizer enthusiasts
| on the list will give you good feedback:
|
| o Maybe they can think of a better query plan or heuristic.
| o Maybe they can see some awkward corner cases in your heuristic.
| o They can advise you on whether your heuristic will short-circuit other
| optimizer choices.
| o They can advise you on whether your heuristic will cause an explosion
| in the time that the optimizer takes.
|
| Thanks for wanting to scratch this itch!
|
| Regards,
| -Rick
|
| Brent Verner wrote:
|
| >Hi,
| >
| > I've recently found need for an embedded java db and only Derby seems
| >even close to handling the task, however the broken query planning for IN
| >clauses makes it unusable :-(. I've decided to eschew an embedded db
| >in favor of PG for now, but I'd really like to be able to use Derby in the
| >near(ish) future for deployment.
| >
| > I'd like to try to fix the query planning around the IN clause. I'm
| >reviewing the internals papers right now, but I'd appreciate any input
| >that might point me in the right direction :-).
| >
| >cheers!
| > Brent
| >
| >
| >