All,

We had this topic raising from time to time, resulted in CORE-1488. So 
far there was no appropriate discussion and hence no decision made, 
although one might argue that the optimizer should be clever enough to 
make hints unnecessary. The PGSQL guys share the same opinion, BTW ;-)

As for me, the ideal optimizer is a purely theoretical thing that can 
never exist in the real world, but I'm *not* going to discuss this in 
depth now, neither I have time for a full featured implementation of 
hints, whatever it would be. However, I'd like to raise one point that's 
not generally related to the optimizer quality.

There's a retrieval rule that can be hardly guessed by the optimizer and 
that can be specified by the user only. I mean whether the optimization 
strategy is targeted at retrieving the whole dataset ASAP or at 
retrieving some first part of rows ASAP, at the cost of slower retrieval 
of the whole dataset. In the database world, this is commonly known as 
ALL ROWS vs FIRST ROWS retrieval strategies.

There's an obvious easy case -- FIRST / ROWS clause -- that clearly 
defines user intentions and that can be used by the optimizer to adjust 
the retrieval strategy. However, users may want the FIRST ROWS strategy 
even in generic queries without using the explicit rows limit. For 
example, it's often important to fetch the first screen of the grid and 
show those rows to the user immediately while fetching the subsequent 
(invisible) rows in the background, thus reducing application response 
times. A workaround with something like SELECT FIRST 1000000000 is 
possible, but it looks crappy.

In Firebird, the ALL ROWS vs FIRST ROWS choice is mostly about SORT vs 
ORDER plans used for sorting/grouping. See CORE-1482 for example. But 
there may be other differences in plans, also targeted at different 
retrieval strategies (e.g. a hash join is slower in returning the first 
rows than a nested loop join).

The good question is how Firebird behaves now from this point of view. 
Mostly it uses the ALL ROWS approach, and the underlying cost based 
optimization algorithms also deal with complete stream cardinalities. 
There are some heuristics based deviations like using the ORDER plan 
even when SORT would be faster, and originally I was thinking that we 
have some intermediate mode (DEFAULT / UNKNOWN ROWS), but later I came 
to conclusion that usage of more advanced statistics (e.g. index 
clustering factor that's already available in GSTAT output in v3) will 
change the behaviour in the right direction. So we may treat Firebird 
basically preferring the ALL ROWS mode now and becoming even stricter in 
this regard in the future.

This makes it necessary to allow the FIRST ROWS mode when it's needed. 
I'm proposing the following:

1) FIRST ROWS mode is implicitly used when the FIRST / ROWS clause is in 
game. Also, FIRST ROWS mode is implicitly used for EXISTS / ANY subqueries.

2) An explicit clause is introduced to force the non-default mode (ALL 
ROWS for FIRST-claused queries and FIRST ROWS for other queries). Hating 
Oracle-like pseudo-comments with embedded hints and preferring explicit 
syntax, I suggest (inspired by DB2):

SELECT ...
[OPTIMIZE FOR {FIRST | ALL} ROWS]

3) Either PLAN or OPTIMIZE clause can be used, syntax error is thrown if 
both are specified.

4) Explicitly specified mode takes precedence over implicitly deducted one.

5) ALL ROWS mode is the default one for queries without FIRST / ROWS 
clause. Whether the default mode can be altered per database (via 
configuration file) or per session is discussable.

6) OPTIMIZE FOR FIRST <number> ROWS can be added later to better assist 
the optimizer in specific cases.

7) OPTIMIZE clause can be later extended for full featured hints if/when 
they will be agreed on. Possible extentions (off the top of my head):

OPTIMIZE [NOT] USING INDEX <index>
or
OPTIMIZE USING HASH JOIN FOR (<table1>, <table2>)
or
OPTIMIZE ENFORCE JOIN ORDER
etc

As so far we speak about optimization of the final (user retrievable) 
datasets, it makes sense to use the OPTIMIZE clause at the top level 
only, i.e. for select statements but not for select expressions. This is 
similar to how WITH LOCK and FOR UPDATE clauses are used. However, full 
featured hints should be available to select expressions as well. I'm 
not sure whether we should choose a "protecting" syntax now and extend 
it (compatibly!) later, or allow FIRST / ALL ROWS rules for subqueries 
since the beginning, or parse the hints and throw context based errors 
if necessary.

Your opinions are appreciated.


Dmitry

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to