Optimizer issue, was: interested in working on DERBY-713

Rick Hillegas Tue, 20 Jun 2006 11:22:20 -0700

Hi Brent,

Thanks for the extra information. And thanks to Army for the pointers toextra documentation. Hopefully our optimizer guru, Jeff Lichtman, willjoin the discussion, but he may be on vacation. Please bear with me as Itry to understand your approach. I am going to use some shorthand todescribe the nodes in the query plan and not use the actual class namesin trunk/java/engine/org/apache/derby/impl/sql/execute, which can be abit obscure. I hope this doesn't confuse the issue.

It sounds to me from the bug description that the Optimizer is choosingthe following plan


Plan 1:

 TableScan

and that you would like to teach the Optimizer to choose this plan, instead:

Plan 2:

               Join
               /     \
 IndexScan   TableScan

I am not sure that Plan 2 will perform much better on the hard case inthe bug report. I'm afraid that 10,000 rows may still be scanned out ofthe index before being filtered and passed up to the Join node. It'squite possible that the Optimizer did, in fact, consider this plan andrejected it because it actually was more expensive. It looks to me asthough the IndexScan only has start and end qualifiers and doesn't havethe concept of nextKey. What you want to end up with, I think, is a planthat probes the index for a small number of key values rather thanscanning it end to end.

It would be interesting to build nextKey support into the IndexScan sothat the scan could be repositioned for IN lists. Somebody pleasecorrect me if the support is already there and I'm just not seeing it.Lacking that support, it might be simpler to build a plan out of theexisting tinker-toys. Something like the following might give you theprobing you need:


Plan 3:

                    Join
                     /   \
              Union  TableScan
              /     \
 IndexScan   IndexScan

Each IndexScan would probe for a single key value. If you could just getthe Opimizer to consider this plan, then I'm cautiously optimistic thatthe existing cost-based decisions would favor Plan 3 over Plans 1 and 2.


I hope this is helpful.

Regards,
-Rick


Brent Verner wrote:

Hiya Rick,

  Thanks for you helpful reply!  I spent some time over the weekend
snooping around (mostly) InListOperatorNode and OptimizerImpl, butnever found out _where_ the table scan was chosen over an index scan
for finding the IN matches...I really wanted to find it before begging
for someone to point out the obvious :-) Specifically, I got alltangled up around the dynamically generated classes which implementthe chosen query plan. Also, I didn't ever find any QueryPlan-likeclass, but I guess this what the dynamic class(es) implement.
 What did I want to do once I got there?  Pseudo-code says it best:

   // ISTR that the bound IN values are ordered and unique.  If not,
   // this should be done...

   if( column is indexed ){
       if( index is unique ){
           // this should be a guaranteed win.  We know we'll only get
           // one match per IN value
           do index scan
       }
       else if( table size > SOME_MAGIC_SIZE ){
           // SOME_MAGIC_SIZE would represent the size of (the) table
           // at which using the index would be reasonably known to
           // be more expensive than scanning the table.
           do index scan
       }
       else {
           do table scan
       }
   }
   else {
       do table scan
   }


 Is that a workable approach (excepting the SOME_MAGIC_SIZE bit)?
I'll probably have some time later this week to immerse myself in locatingthe magic location to apply this logic. A _little_ pointer would behelpful: maybe a brief overview of how a query plan is determined -- I
suspect his is somewhere around where trulyTheBestTableAccess (or whatever
it's really called :-)) is defined. I'd rather have to bang my headagainst this wall a bit, to get a better understanding of things, instead
of being pointed immediately to the solution -- my own version of smelling
the roses, I guess ;-)

Cheers!
 Brent


[2006-06-19 11:30] Rick Hillegas said:
| Hi Brent,
|| Sounds like you're off to a good start. From the initial bug report, it| looks like there's a good idea about which heuristic is being| mis-applied here. Once you've studied the optimizer papers, I recommend| that you post some high-level candidate solutions. Try to avoid| optimizer jargon and concentrate on simple descriptions:|| o What query plan would you rather see?| o What heuristic would the optimizer apply that would lead it to your| preferred plan?| o How would the optimizer decide to apply the new heuristic rather than| the old one?|| I think you'll get some good feedback if your post contains the phrase| "Attention, optimizer experts." I think that the optimizer enthusiasts| on the list will give you good feedback:|| o Maybe they can think of a better query plan or heuristic.
| o Maybe they can see some awkward corner cases in your  heuristic.
| o They can advise you on whether your heuristic will short-circuit other| optimizer choices.| o They can advise you on whether your heuristic will cause an explosion| in the time that the optimizer takes.|| Thanks for wanting to scratch this itch!|| Regards,
| -Rick
|| Brent Verner wrote:|| >Hi,
| >
| >    I've recently found need for an embedded java db and only Derby seems
| >even close to handling the task, however the broken query planning for IN
| >clauses makes it unusable :-(.  I've decided to eschew an embedded db
| >in favor of PG for now, but I'd really like to be able to use Derby in the
| >near(ish) future for deployment.
| >
| >    I'd like to try to fix the query planning around the IN clause.  I'm
| >reviewing the internals papers right now, but I'd appreciate any input
| >that might point me in the right direction :-).
| >
| >cheers!
| > Brent
| >
| >| >

Optimizer issue, was: interested in working on DERBY-713

Reply via email to