> >> It depends. Primary key may be stored more-or-less in regard to > >> physical disk location. And the measure how good is INDEX vs NATURAL > >> scan can be available to the optimizer (index clustering factor). > > > > Without the knowledge of way that Primary Keys map to storage order an > assumption based on same would be just as bad a assuming that UDF are by > default deterministic, no? > > I'm not talking about assumptions, the statistics either proves it or not.
I don't see what statistics will tell the engine. Unlike index distribution statistics, which provides some details on the commonality of values, I don't see how stats on the "orderedness" of a primary key provide any meaningful details, when the stat is anything other than 100% ordered -- never mind that I suspect that the cost of capturing/maintaining the stat will outweigh its value. The reality is that it is not the orderedness of the index which will matter, it is how the rows on a data page relate to the rows on the next data page. Consider that rows 4,2,3,1 are on page 11 and 7,8,6,5 on page 12. It is a reasonable argument using the index to read the rows would be valid, since by reading the page for row 1, rows 2 thru 4 would be in the FB cache, with the same applying to rows 5 thru 8. But there are plenty of possible cases where the keys order and the on disk page order would bear no relationship, because empty pages are reused. So, what would be approach if rows 4,2,3,1 are on page 21 and 7,8,6,5 on page 12. It is a reasonable navigate using the index? How would the stats determine if the index should be used> > > Further, wouldn't an Index clustering factor really be of little value as > > the > number of exceptions to the physical vs. primary key order increase? Since > any exception to key order would result in random IO, which can quickly kill > system performance vs. natural scan. > > We don't compare INDEX vs NATURAL, we compare INDEX vs > SORT(NATURAL). > There should be plenty random page jumps to exceed the external sort cost. I will have some comments on this but need to ask some questions first to make sure my understanding of the engine's sort logic is correct (will post on this right after this reply) > There's virtually no difference between scanning the index for IS NULL and > for IS NOT NULL. MGA effects and costs are absolutely the same. The only > thing that matters is the predicate selectivity. It's surely not a guarantee > anyway -- all versions with non-NULL key may be invisible to the current > transaction thus killing the performance. I see your point here. > But once again, it's absolutely the same as for IS NULL or any other > condition. > I fail to see why you insist on the problem for IS NOT NULL only. My statement was based on my memory of why the use of IS NOT NULL was disabled in earlier FB releases. If IS NOT NULL can be treated the same as any other condition, why isn't it now? Sean ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel