Re: Index perfomance

Mike Matrigali Thu, 26 Mar 2009 09:39:40 -0700


The technique of adding extra columns to indexes works well for derby if
it matching your application needs.  The docs usually refer to this as a

covering index and the optimizer is pretty good at looking for caseswhere it can use a covering index and avoid going to the base table.Hopefully this will help your application.


From info posted, the query with the index is definitely doing an index
to base row lookup for every qualifying row. It looks like for this
data distribution and number of qualifying rows this a worst plan than
just doing a full table scan.
The optimizer should have picked the base table scan
rather than the index to base row given how the 2 performed. I think

this is another case showing the costs for the optimizer need to beupdated to reflect current technology. There has been a lot of work to

make scans go fast and that is not reflected in the current costing.

The optimizer estimated 203425 rows for the index qualification and got
271136 which seems not too bad (5% vs. 6.8%).  This info comes from the

with-index.txt query plan. It assumes equal distribution of valuesacross all the values so maybe this range was a

little hotter than others.  Since the row count estimate looks close I
lean toward the base costing as the problem.

It would be interesting to know how much of the index/baserowperformance issue is that it keeps getting cache misses vs. the cpuoverhead of just doing the base row look up for every row. For this dbit would take a 50,000 page cache just to cache the base row pluswhatever it takes to cache the index.

For this kind of distribution I have seen db's gather all the rowpointers from the index and then do ordered probes into the base table.

This insures good caching for the lookups.  Derby does not have this
technique available.


Tim Dudgeon wrote:

Thanks for the comments. Here is some more info.
I attach the DDL for the table concerned, the simple test program I useand the execution strategy with and without an index.
Some additional points:
1. the query returning 7% of the table is certainly not an extreme case.The exact query criteria are specified by the user in the UI, and can bemuch worse than this case. I have no control over the natur eof thequery that the user specifies.
2. Yes, if the query is much more selective the index can be a help.

3. The biggest data file in seg0 is 1452572672 bytes in size (e.g. 1.4GB).

4. the index was added after the rows were added.
5. making the index also have the pk_column as the second indexed fieldmakes it go like lightning! search runs in 2 secs, about 14x faster.
So in summary it seems like an index will be of no help to me in thissituation, unless I make it an index of both columns.
Many thanks

Tim





Tim Dudgeon wrote:
I found an interesting preformace problem that I'd welcome some helpin inderstanding.
I have a table of 4,000,000 rows that has a DOUBLE column containingfloating point numbers.
I run a query like this:
select pk_column from the_table where the_double_column > 300 andthe_double_column < 320
271136 rows are returned.
I then go through the ResultSet and extract all the id_column values.
All of this is done using standard JDBC.
When I do this it takes 23 seconds, which I though was notunreasonable as a full table scan was involved and the table waspretty big.
But of course I thought an index would help, so I added an index tothe the_double_column and repeated the process. It took 720 seconds,31x slower! I thought this was strange, but thought it might bebecause I was using standard settings and the 4MB page cache was muchtoo small to hold the index.
So I increased the page cache size (derby.storage.pageCacheSizeproperty) to 10x the size (10,000) and repeated the process. There wasonly a very minor improvement in speed.
In all cases the memory usage, as reported by:
Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
really no differnt, and the used memory was much less that the maximumavailable specified by the -Xmx setting.
Any ideas what to do?

Tim

Re: Index perfomance

Reply via email to