i did not mean to say that sometimes this stat is not important. Sounds
like your data is the classic problem. I assume the index on the
timestamp is not unique, but in reality it is either unique or "almost"
so. Thus the default of something like 10% makes the system think a lot
more rows will qualify. For a join the estimate goes way wrong when
you start multiplying assuming each join results in 10% of the rows
where the reality is probably 1 row.
For other "stats" derby is different than most db's. Rather than store
stats say for histograms of data distributions it looks at the actual
indexes at compile time to determine a specific data distribution. Also
number of rows is also part of the table structure itself and not in
the statistics table.
Matt Doran wrote:
Hi Mike,
Mike Matrigali wrote:
Most of the "stat" information that derby uses is automatically kept up
to date as part of underlying index and table maintenance. This info
includes count of rows in the table and data distribution of data in
indexes.
That might be the case, but obviously these extra statistics calculated
during a compress make a big impact on the query optimization. My case
involved querying a typical "log" table that had an indexed timestamp
column. In the query that ran extremely slowly, we were filtering on
the indexed date column to only show data for the last week (and doing a
number of joins). Without the stats it took 22 minutes, with the stats
it was sub 1 second.
I'm not sure where these other stats stored (in the index structures
themselves??), but my DB didn't have any entries in the
sys.sysstatistics table. After performing a compress, the statistics
table was updated, and the query performance improved dramatically.
Thanks,
Matt