On Friday 11 November 2005 08:59, Kevin Hore wrote:
Hi,
Before complaining about the query optimizer, lets look at your design first.
Yeah, I know that Derby, like every other database is not perfect and could
always use improvements. But the reality is that the first place to look for
improvements is in your own code.
Since you've got a small table and only 260K rows, lets try these simple
experiments to see if it will help.
----PROBLEM DESCRIPTION----
Consider the table:
CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);
Just a first suggestion, and its really a nit, could you alter the table to
have Word as the last column?
CREATE TABLE tblSearchDictionary
(
objectId INT NOT NULL,
objectType INT NOT NULL,
wordLocation INT NOT NULL,
word VARCHAR(64) NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE
(objectId,objectType,word,wordLocation)
);
This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:
Ok first suggestion. Drop all of your Indexes.
Based on your examples, these indexes are part of the problem.
For example an index on objectType doesn't make any sense.
First, in your example, you don't even use objectType in any of your queries.
Second. How many objectTypes are there? 10,100, 1000? The point is that
objectType is not an unique enough identifier to justify being its own index.
The same could be said for wordLocation. So why have an index on this column?
CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);
Ok, since you believe that the Optimizer is having problems selecting the
correct Index to use, lets create your new indexes one by one to see how they
can improve performance.
Lets review your query statements:
The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by the
ObjectId column.
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;
On my machine this will usually complete in an acceptable time of around
200ms.
Ok, this is pretty straight forward enough.
A good index would be a composite index on objectID and word.
Now consider the following query which adds a second search term on the
same column.
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;
Ok, again if you create a composite index on objectID and Word, you will get
better performance.
As you point out below, the query may get rewritten. If you wanted to, you
could just rewrite your query to say "WHERE word IN ('CONTACT', 'ADD') but
I'd save that for an option to test as a further tweak.
So, as a test, do the following:
1) DROP ALL CURRENT INDEXes.
2) CREATE a composite index on objectID and Word.
Run your query tests to see what happens to performance.
3) To test your theory about the optimizer, rewrite your second query to use
the IN (xxx,xxx) option rather than OR.
In addition:
1) Recreate your table with your VARCHAR column as the last column.
2) Alter your table and replace your VARCHAR column with a fixed length
CHAR(64) column.
If you can, please post your results.
Do each change and see what happens to your query times.
HTH
-Gumby