Re: Poor query optimizer choices is making Derby unusable for large tables

Michael Segel Mon, 14 Nov 2005 05:14:19 -0800

On Saturday 12 November 2005 16:04, Michael Segel wrote:
> On Saturday 12 November 2005 10:28, Michael Segel wrote:
> >     CREATE TABLE tblSearchDictionary
> >     (
> >              objectId               INT NOT NULL,
> >              objectType     INT NOT NULL,
> >              wordLocation   INT NOT NULL,
> >              word           VARCHAR(64) NOT NULL,
> >             CONSTRAINT CONSd0e222 UNIQUE 
> > (objectId,objectType,word,wordLocation)
> >     );
> >
> > > This table has an index on each of the four columns, it also has the
> > > unique index across all four columns as defined above:
>
> As a follow up...
> Change the constraint order to UNIQUE(objectId, word, wordLocation,
> objectType);
>
> You probably can drop the index I asked you to create if you make this mod.
> Then your query should use the unique constraint's "backing index"...
>
> HTH.
>
> -G


If you want to make this a little bit more efficient, you could try the 
following:

1) Create two tables. 

CREATE TABLE wordKey
(
        wordID  INT  NOT NULL GENERATED ALWAYS AS IDENTITY,
        word            VARCHAR(128) NOT NULL,
        CONSTRAINT      wordKeyCONS01 UNIQUE (wordID, word)
);
// Note this should create the backing INDEX so you don't need to create an 
INDEX

        CREATE TABLE tblSearchDictionary
        (
                 objectId               INT NOT NULL,
                 objectType     INT NOT NULL,
                 wordLocation   INT NOT NULL,
                 wordID                 INT NOT NULL,
                CONSTRAINT CONSd0e222 UNIQUE (objectId,wordID, wordLocation, 
objectType)
        );
// Note: I didn't see the SQL Syntax for creating a foreign key on wordID but 
you'll want to do that and to also for cascading deletes, even though you may 
not ever delete a word from wordKey...

It looks like you're trying to build your own word search index of documents.
If you you'll want to automate it rather than use static queries.

So lets say you're getting a text input of "CONSTRAINT  OR BAR AND FOO".
I would interpret this to mean select documents that either contain the word 
"CONSTRAINT" or the words "FOO" AND "BAR".
Note that you're going to create a parse tree that says node one contains 1 
item, "CONSTRAINT", node two contains 2 items, "FOO", "BAR"...

So you would want to create a SELECT statement that is equivlent to:

SELECT objectID, SUM(wordLocation) SCORE 
FROM    tblSearchDictionary
WHERE  wordID = xxx
UNION (
        SELECT objectID, SUM(wordLocation) SCORE
        FROM tblSearchDictionary
        WHERE wordID IN (yyy,zzz)
)...
GROUP BY objectID

Where xxx, yyy, zzz are integer values that you got from your look up.

While this isn't the complete solution, it should point you in the correct 
direction and should work independent of some kludges in the optimizer.

Your wordKey look up statement would be a prepared statement, while your 
actual scoring routine would be a dynamic statement.  

This should give you what you want.

I believe that there may be more to this for example what happens if you have  
a document that contains all three words ....

I'd also recommend the lookup table as a way to decrease the size of your 
overall solution and it may change how your optimizer looks at things. 
(You're doing your main query all on ints...)

But hey, what do I know? I am just having my first cup of joe and you get what 
you paid for... ;-)

-G

-- 
Michael Segel
Principal 
MSCC

Re: Poor query optimizer choices is making Derby unusable for large tables

Reply via email to