Thanks for the comments. Here is some more info.
I attach the DDL for the table concerned, the simple test program I use
and the execution strategy with and without an index.
Some additional points:
1. the query returning 7% of the table is certainly not an extreme case.
The exact query criteria are specified by the user in the UI, and can be
much worse than this case. I have no control over the natur eof the
query that the user specifies.
2. Yes, if the query is much more selective the index can be a help.
3. The biggest data file in seg0 is 1452572672 bytes in size (e.g. 1.4GB).
4. the index was added after the rows were added.
5. making the index also have the pk_column as the second indexed field
makes it go like lightning! search runs in 2 secs, about 14x faster.
So in summary it seems like an index will be of no help to me in this
situation, unless I make it an index of both columns.
Many thanks
Tim
Tim Dudgeon wrote:
I found an interesting preformace problem that I'd welcome some help in
inderstanding.
I have a table of 4,000,000 rows that has a DOUBLE column containing
floating point numbers.
I run a query like this:
select pk_column from the_table where the_double_column > 300 and
the_double_column < 320
271136 rows are returned.
I then go through the ResultSet and extract all the id_column values.
All of this is done using standard JDBC.
When I do this it takes 23 seconds, which I though was not unreasonable
as a full table scan was involved and the table was pretty big.
But of course I thought an index would help, so I added an index to the
the_double_column and repeated the process. It took 720 seconds, 31x
slower! I thought this was strange, but thought it might be because I
was using standard settings and the 4MB page cache was much too small to
hold the index.
So I increased the page cache size (derby.storage.pageCacheSize
property) to 10x the size (10,000) and repeated the process. There was
only a very minor improvement in speed.
In all cases the memory usage, as reported by:
Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
really no differnt, and the used memory was much less that the maximum
available specified by the -Xmx setting.
Any ideas what to do?
Tim
-- ----------------------------------------------
-- DDL Statements for tables
-- ----------------------------------------------
CREATE TABLE "APP"."NCI_AUG00_1M" ("CD_ID" INTEGER NOT NULL GENERATED BY
DEFAULT AS IDENTITY (START WITH 1, INCREMENT BY 1), "CD_STRUCTURE"
BLOB(2147483647) NOT NULL, "CD_SMILES" VARCHAR(1000), "CD_FORMULA"
VARCHAR(100), "CD_SORTABLE_FORMULA" VARCHAR(255), "CD_MOLWEIGHT" DOUBLE,
"CD_HASH" INTEGER NOT NULL, "CD_FLAGS" VARCHAR(20), "CD_TIMESTAMP" TIMESTAMP
NOT NULL, "CD_FP1" INTEGER NOT NULL, "CD_FP2" INTEGER NOT NULL, "CD_FP3"
INTEGER NOT NULL, "CD_FP4" INTEGER NOT NULL, "CD_FP5" INTEGER NOT NULL,
"CD_FP6" INTEGER NOT NULL, "CD_FP7" INTEGER NOT NULL, "CD_FP8" INTEGER NOT
NULL, "CD_FP9" INTEGER NOT NULL, "CD_FP10" INTEGER NOT NULL, "CD_FP11" INTEGER
NOT NULL, "CD_FP12" INTEGER NOT NULL, "CD_FP13" INTEGER NOT NULL, "CD_FP14"
INTEGER NOT NULL, "CD_FP15" INTEGER NOT NULL, "CD_FP16" INTEGER NOT NULL,
"FIELD_0" INTEGER);
-- ----------------------------------------------
-- DDL Statements for indexes
-- ----------------------------------------------
CREATE INDEX "APP"."IDX_MOL_WEIGHT" ON "APP"."NCI_AUG00_1M" ("CD_MOLWEIGHT");
CREATE INDEX "APP"."NCI_AUG00_1M_FX" ON "APP"."NCI_AUG00_1M"
("CD_SORTABLE_FORMULA");
CREATE INDEX "APP"."NCI_AUG00_1M_HX" ON "APP"."NCI_AUG00_1M" ("CD_HASH");
-- ----------------------------------------------
-- DDL Statements for keys
-- ----------------------------------------------
-- primary/unique
ALTER TABLE "APP"."NCI_AUG00_1M" ADD CONSTRAINT "SQL090324105153810" PRIMARY
KEY ("CD_ID");
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package javaapplication1;
import java.sql.CallableStatement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.Types;
import java.util.ArrayList;
import java.util.List;
/**
*
* @author timbo
*/
public class Main {
String URL =
"jdbc:derby:/home/timbo/IJCProjects/ijc-project-0000/.config/localdb/db";
String DRIVER = "";
Connection con = null;
String sql = "select distinct cd_id from NCI_AUG00_1M where cd_molweight >
300 and cd_molweight < 320";
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws Exception {
Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
Main m = new Main();
//m.setPageData();
//m.getPageData();
m.runMWSearch();
}
Connection getConnection() throws SQLException {
if (con == null) {
con = DriverManager.getConnection(URL);
}
return con;
}
void runMWSearch() throws SQLException {
Statement stmt = getConnection().createStatement();
long max = Runtime.getRuntime().maxMemory();
System.out.println("Max memory = " + max);
long t0 = System.currentTimeMillis();
ResultSet rs = stmt.executeQuery(sql);
long t1 = System.currentTimeMillis();
List vals = new ArrayList();
long total = 0;
long free = 0;
int count = 0;
while (rs.next()) {
count++;
int id = rs.getInt(1);
vals.add(id);
if (count % 1000 == 0) {
//Runtime.getRuntime().gc();
total = Runtime.getRuntime().totalMemory();
free = Runtime.getRuntime().freeMemory();
System.out.println(" " + count + " => " + (total - free));
}
}
rs.close();
stmt.close();
long t2 = System.currentTimeMillis();
Runtime.getRuntime().gc();
total = Runtime.getRuntime().totalMemory();
free = Runtime.getRuntime().freeMemory();
System.out.println("Number of hits: " + vals.size());
System.out.println("Query took: " + (t1 - t0));
System.out.println("Retrieve took: " + (t2 - t1));
System.out.println("Memory: " + (total - free));
}
void setPageData() throws SQLException {
String val = "1000";
CallableStatement cs = getConnection().prepareCall("CALL
SYSCS_UTIL.SYSCS_SET_DATABASE_PROPERTY(?, ?)"); // NOI18N
cs.setString(1, "derby.storage.pageCacheSize"); // NOI18N
cs.setString(2, val); // NOI18N
//cs.setNull(2, Types.VARCHAR);
cs.execute();
cs.close();
System.out.println("derby.storage.pageCacheSize set to " + val);
}
void getPageData() throws SQLException {
String sql = "VALUES
SYSCS_UTIL.SYSCS_GET_DATABASE_PROPERTY('derby.storage.pageCacheSize')";
Statement stmt = getConnection().createStatement();
ResultSet rs = stmt.executeQuery(sql);
String val = null;
if (rs.next()) {
val = rs.getString(1);
}
rs.close();
stmt.close();
System.out.println("derby.storage.pageCacheSize=" + val);
}
}
2009-03-26 11:55:48.894 GMT Thread[main,5,main] (XID = 9540524), (SESSIONID =
0), select distinct cd_id from NCI_AUG00_1M where cd_molweight > 300 and
cd_molweight < 320 ******* Project-Restrict ResultSet (2):
Number of opens = 1
Rows seen = 271136
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 436037.89
optimizer estimated cost: 6339476.19
Source result set:
Table Scan ResultSet for NCI_AUG00_1M at read committed isolation level
using instantaneous share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 271136
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 5}
Number of columns fetched=2
Number of pages visited=44327
Number of rows qualified=271136
Number of rows visited=4004016
Scan type=heap
start position:
null stop position:
null qualifiers:
Column[0][0] Id: 5
Operator: <
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
Column[0][1] Id: 5
Operator: <=
Ordered nulls: false
Unknown return value: true
Negate comparison result: true
optimizer estimated row count: 436037.89
optimizer estimated cost: 6339476.19
2009-03-26 12:19:39.988 GMT Thread[main,5,main] (XID = 9621712), (SESSIONID =
0), select distinct cd_id from NCI_AUG00_1M where cd_molweight > 300 and
cd_molweight < 320 ******* Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 271136
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 203425.00
optimizer estimated cost: 644900.84
Source result set:
Index Row to Base Row ResultSet for NCI_AUG00_1M:
Number of opens = 1
Rows seen = 271136
Columns accessed from heap = {0, 5}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 203425.00
optimizer estimated cost: 644900.84
Index Scan ResultSet for NCI_AUG00_1M using index
IDX_MOL_WEIGHT at read committed isolation level using instantaneous share row
locking chosen by the optimizer
Number of opens = 1
Rows seen = 271136
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched=All
Number of columns fetched=2
Number of deleted rows visited=0
Number of pages visited=1816
Number of rows qualified=271136
Number of rows visited=271137
Scan type=btree
Tree height=-1
start position:
> on first 1 column(s).
Ordered null semantics on the following columns:
stop position:
>= on first 1 column(s).
Ordered null semantics on the following columns:
qualifiers:
None
optimizer estimated row count: 203425.00
optimizer estimated cost: 644900.84