2011/9/13 Svein Erling Tysvær <[email protected]>
> > > Firebird doesn't have histograms yet, so it knows nothing about which field > values are common and which are rare when it chooses between using the index > of a probably reasonably selective field (I assume you have lots of records > in maestro_turnos and that 300K is only a small fraction) > ... > In the future, I think Firebird will implement histograms (maybe it is > already part of Firebird 3?), and then I guess it will make the right > decision in cases like yours. > In case there's someone out there wondering what histograms are in this context and what they have to do with optimization, let me elaborate a bit. Firebird keeps the selectivity of indexes and in the case of compound indexes, the selectivity of each field in the key. The selectivity is the number of distinct values compared to the total number of entries in the index. So, if you've got a table with 100,000 records and one of the fields has five distinct values, an index on that field would have a terrible selectivity. On the other hand, if 999,900 records had one value for the field and the other four values were distributed among the remaining 100 records, then for those four values, the index would have good selectivity. A histogram records the distribution of values in an index, so it's a much more sensitive way of determining the usefulness of an index, providing you know the values you're looking for. Different databases optimize queries at different times. Firebird optimizes its queries when they are presented so for joins and prepared queries with parameters, it doesn't know all the input values. Some other systems delay optimization until all parameters are known, so the inner part of a join is optimized for each new value from the outer part of the join and a parameterized query is optimized for each new value of the parameters. Systems like that can make great use of histograms, but at a significant cost for re-optimization. If I understood the query presented, it was looking for records in a specific range of dates. Range queries are particularly awkward to optimize based on selectivity. And the dates were explicit. So for that particular query, histograms would be a great help. Good luck, Ann [Non-text portions of this message have been removed]
