Hi Michael,

Thank you for yours suggestions. You've obviously very kindly given it
quite a bit a thought.

I'd already tried a bunch of the things that you have suggested.
Defining a composite key and removing the existing indexes did nothing
to improve performance. Re-writing using IN clause produced no
improvement (indeed, I'm pretty sure that this is what Derby is doing
anyway).

I ought to point out here that my comments on how the optimizer would
re-write my query is based on the Derby documentation found at
http://db.apache.org/derby/docs/10.1/tuning/rtuntransform582.html

Unfortunately (for me) this is merely one simple example of a whole
class of problems with different queries involving different tables in
the system I'm working on, and the problem is fundamentally that
multiple IN/WHERE terms aren't being handled sensibly, as summarized by
Satheesh Bandaram in his post. Derby is able to choose a sensible
strategy with just one term in the WHERE/IN clause -- it's just the case
of multiple terms where it chooses inappropriate strategies.

I will add my thoughts to DERBY-47 regarding this.

Thanks again,

Kevin Hore


Michael Segel wrote:
On Friday 11 November 2005 08:59, Kevin Hore wrote:
Hi,

Before complaining about the query optimizer, lets look at your design first.
Yeah, I know that Derby, like every other database is not perfect and could always use improvements. But the reality is that the first place to look for improvements is in your own code.

Since you've got a small table and only 260K rows, lets try these simple experiments to see if it will help.


----PROBLEM DESCRIPTION----

Consider the table:

CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);


Just a first suggestion, and its really a nit, could you alter the table to have Word as the last column?
        CREATE TABLE tblSearchDictionary
        (
                 objectId               INT NOT NULL,
                 objectType     INT NOT NULL,
                 wordLocation   INT NOT NULL,
                 word           VARCHAR(64) NOT NULL,
                CONSTRAINT CONSd0e222 UNIQUE 
(objectId,objectType,word,wordLocation)
        );

This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:


Ok first suggestion. Drop all of your Indexes.
Based on your examples, these indexes are part of the problem.

For example an index on objectType doesn't make any sense.
First, in your example, you don't even use objectType in any of your queries.
Second. How many objectTypes are there? 10,100, 1000? The point is that objectType is not an unique enough identifier to justify being its own index.
The same could be said for wordLocation. So why have an index on this column?



CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);



Ok, since you believe that the Optimizer is having problems selecting the correct Index to use, lets create your new indexes one by one to see how they can improve performance.

Lets review your query statements:


The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by the
ObjectId column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;

On my machine this will usually complete in an acceptable time of around
200ms.


Ok, this is pretty straight forward enough.

A good index would be a composite index on objectID and word.


Now consider the following query which adds a second search term on the
same column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;



Ok, again if you create a composite index on objectID and Word, you will get better performance.

As you point out below, the query may get rewritten. If you wanted to, you could just rewrite your query to say "WHERE word IN ('CONTACT', 'ADD') but I'd save that for an option to test as a further tweak.


So, as a test, do the following:

1) DROP ALL CURRENT INDEXes.
2) CREATE a composite index on objectID and Word.

Run your query tests to see what happens to performance.

3) To test your theory about the optimizer, rewrite your second query to use the IN (xxx,xxx) option rather than OR.

In addition:

1) Recreate your table with your VARCHAR column as the last column.
2) Alter your table and replace your VARCHAR column with a fixed length CHAR(64) column.

If you can, please post your results.
Do each change and see what happens to your query times.

HTH

-Gumby



Reply via email to