All, We had this topic raising from time to time, resulted in CORE-1488. So far there was no appropriate discussion and hence no decision made, although one might argue that the optimizer should be clever enough to make hints unnecessary. The PGSQL guys share the same opinion, BTW ;-)
As for me, the ideal optimizer is a purely theoretical thing that can never exist in the real world, but I'm *not* going to discuss this in depth now, neither I have time for a full featured implementation of hints, whatever it would be. However, I'd like to raise one point that's not generally related to the optimizer quality. There's a retrieval rule that can be hardly guessed by the optimizer and that can be specified by the user only. I mean whether the optimization strategy is targeted at retrieving the whole dataset ASAP or at retrieving some first part of rows ASAP, at the cost of slower retrieval of the whole dataset. In the database world, this is commonly known as ALL ROWS vs FIRST ROWS retrieval strategies. There's an obvious easy case -- FIRST / ROWS clause -- that clearly defines user intentions and that can be used by the optimizer to adjust the retrieval strategy. However, users may want the FIRST ROWS strategy even in generic queries without using the explicit rows limit. For example, it's often important to fetch the first screen of the grid and show those rows to the user immediately while fetching the subsequent (invisible) rows in the background, thus reducing application response times. A workaround with something like SELECT FIRST 1000000000 is possible, but it looks crappy. In Firebird, the ALL ROWS vs FIRST ROWS choice is mostly about SORT vs ORDER plans used for sorting/grouping. See CORE-1482 for example. But there may be other differences in plans, also targeted at different retrieval strategies (e.g. a hash join is slower in returning the first rows than a nested loop join). The good question is how Firebird behaves now from this point of view. Mostly it uses the ALL ROWS approach, and the underlying cost based optimization algorithms also deal with complete stream cardinalities. There are some heuristics based deviations like using the ORDER plan even when SORT would be faster, and originally I was thinking that we have some intermediate mode (DEFAULT / UNKNOWN ROWS), but later I came to conclusion that usage of more advanced statistics (e.g. index clustering factor that's already available in GSTAT output in v3) will change the behaviour in the right direction. So we may treat Firebird basically preferring the ALL ROWS mode now and becoming even stricter in this regard in the future. This makes it necessary to allow the FIRST ROWS mode when it's needed. I'm proposing the following: 1) FIRST ROWS mode is implicitly used when the FIRST / ROWS clause is in game. Also, FIRST ROWS mode is implicitly used for EXISTS / ANY subqueries. 2) An explicit clause is introduced to force the non-default mode (ALL ROWS for FIRST-claused queries and FIRST ROWS for other queries). Hating Oracle-like pseudo-comments with embedded hints and preferring explicit syntax, I suggest (inspired by DB2): SELECT ... [OPTIMIZE FOR {FIRST | ALL} ROWS] 3) Either PLAN or OPTIMIZE clause can be used, syntax error is thrown if both are specified. 4) Explicitly specified mode takes precedence over implicitly deducted one. 5) ALL ROWS mode is the default one for queries without FIRST / ROWS clause. Whether the default mode can be altered per database (via configuration file) or per session is discussable. 6) OPTIMIZE FOR FIRST <number> ROWS can be added later to better assist the optimizer in specific cases. 7) OPTIMIZE clause can be later extended for full featured hints if/when they will be agreed on. Possible extentions (off the top of my head): OPTIMIZE [NOT] USING INDEX <index> or OPTIMIZE USING HASH JOIN FOR (<table1>, <table2>) or OPTIMIZE ENFORCE JOIN ORDER etc As so far we speak about optimization of the final (user retrievable) datasets, it makes sense to use the OPTIMIZE clause at the top level only, i.e. for select statements but not for select expressions. This is similar to how WITH LOCK and FOR UPDATE clauses are used. However, full featured hints should be available to select expressions as well. I'm not sure whether we should choose a "protecting" syntax now and extend it (compatibly!) later, or allow FIRST / ALL ROWS rules for subqueries since the beginning, or parse the hints and throw context based errors if necessary. Your opinions are appreciated. Dmitry ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel