Hi Kevin,
You might also try using a temporary table to split your scan up into a
series of optimzable queries. Regards-Rick
declare global temporary table session.accumulator
(
ObjectId int NOT NULL,
WordLocation int NOT NULL
)
not logged;
insert into session.accumulator
SELECT ObjectId, WordLocation
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
;
insert into session.accumulator
SELECT ObjectId, WordLocation
FROM tblSearchDictionary
WHERE Word = 'ADD'
;
SELECT ObjectId, SUM(WordLocation) AS Score
FROM session.accumulator
GROUP BY ObjectId;
Mamta Satoor wrote:
Hi Kevin,
I haven't investigated Derby-47 to know how to fix the problem but I
do have an optimizer overrides patch waiting for review on the derby
developer list which will let user specify their own optimizer
properties to help the optimizer pick a specific plan. The JIRA entry
for optimizer overrides is Derby-573. The patch is for Derby 10.2.
These optimizer properties can be handy in a case like yours where
until the optimizer is fixed, a user can tell the optimizer what plan
to use. Just an FYI as I know it doesn't help with your particular
case since there is no optimizer overrides support for 10.1.
Mamta
On 11/11/05, *Kevin Hore* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
The Derby query optimizer (this is with 10.1.1.0
<http://10.1.1.0>) decides to perform an
expensive table scan in certain circumstances, when it could make
use of
available indexes. The resulting poor performance is rendering Derby
useless for our application.
I believe that this behaviour might be related to DERBY-47
( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
October 2004. However, this seems such a severe and fundamental
problem
that I'm surprised firstly that a significant portion of the Derby
user
base isn't affected by this and, secondly, that DERBY-47 hasn't been
considered a critical defect to be fixed with the utmost urgency.
I've described the problem in detail below, and I'd appreciate any
assistance. Specifically:
i) Does anyone have any plans to fix this problem?
ii) In the meantime, are there any work-arounds? I'd appreciate any
suggestions that would decrease the execution time of my second query
below (the one with with two search terms). Likewise, any general
strategies for avoiding this problem with IN clauses would be
appreciated.
----PROBLEM DESCRIPTION----
Consider the table:
CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);
This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:
CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
(ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);
The table contains about 260,000 rows.
The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by
the
ObjectId column.
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;
On my machine this will usually complete in an acceptable time of
around
200ms.
Now consider the following query which adds a second search term
on the
same column.
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;
This second query usually takes around 10000ms on my machine. My
understanding from the Derby optimizer docs and DERBY-47 is that
this is
because Derby is re-writing the query along the following lines, and
then choosing to do a table scan:
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
Word IN ('CONTACT', 'ADD')
AND Word >= 'ADD'
AND Word <= 'CONTACT'
GROUP BY ObjectId;
The plan for the first query indicates that the
tblSearchDictionaryWord
index is used to perform an index scan. However, the plan for the
second
query indicates that the majority of the additional time is taken
performing a table scan over the entire table, instead of making
use of
the indexes available. Our application uses IN quite frequently,
so this
optimizer behaviour would seem to present a significant problem.
---QUERY PLAN FOR FIRST QUERY----
Statement Name:
null
Statement Text:
SELECT
ObjectId,
SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
Word = 'CONTACT'
GROUP BY
ObjectId
Parse Time: 0
Bind Time: 0
Optimize Time: 16
Generate Time: 0
Compile Time: 16
Execute Time: 0
Begin Compilation Timestamp : 2005-11-11 12:28:52.765
End Compilation Timestamp : 2005-11-11 12:28: 52.781
Begin Execution Timestamp : 2005-11-11 13:08:15.406
End Execution Timestamp : 2005-11-11 13:08:15.406
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 93
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 226.00
Source result set:
Grouped Aggregate ResultSet:
Number of opens = 1
Rows input = 113
Has distinct aggregate = false
In sorted order = false
Sort information:
Number of rows input=113
Number of rows output=93
Sort type=internal
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 226.00
Source result set:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 113
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 118.00
optimizer estimated cost: 226.00
Source result set:
Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
Number of opens = 1
Rows seen = 113
Columns accessed from heap = {0, 3}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 118.00
optimizer estimated cost: 226.00
Index Scan ResultSet for TBLSEARCHDICTIONARY using
index
TBLSEARCHDICTIONARYWORD at read committed isolation level using share
row locking chosen by the optimizer
Number of opens = 1
Rows seen = 113
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched=All
Number of columns fetched=2
Number of deleted rows visited=0
Number of pages visited=4
Number of rows qualified=113
Number of rows visited=114
Scan type=btree
Tree height=3
start position:
>= on first 1 column(s).
Ordered null semantics on the following columns:
0
stop position:
> on first 1 column(s).
Ordered null semantics on the following columns:
0
qualifiers:
None
optimizer estimated row count: 118.00
optimizer estimated cost: 226.00
---QUERY PLAN FOR SECOND QUERY----
Statement Name:
null
Statement Text:
SELECT
ObjectId,
SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
Word = 'CONTACT' OR Word = 'ADD'
GROUP BY
ObjectId
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 15
Compile Time: 15
Execute Time: 4250
Begin Compilation Timestamp : 2005-11-11 13:16:17.578
End Compilation Timestamp : 2005-11-11 13:16: 17.593
Begin Execution Timestamp : 2005-11-11 13:16:17.593
End Execution Timestamp : 2005-11-11 13:16:27.437
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 100
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 4250
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 82959.49
Source result set:
Grouped Aggregate ResultSet:
Number of opens = 1
Rows input = 712
Has distinct aggregate = false
In sorted order = false
Sort information:
Number of rows input=712
Number of rows output=593
Sort type=internal
constructor time (milliseconds) = 0
open time (milliseconds) = 4250
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 82959.49
Source result set:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 712
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4219
close time (milliseconds) = 15
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 19200.45
optimizer estimated cost: 82959.49
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 40806
Rows filtered = 40094
restriction = true
projection = false
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4219
close time (milliseconds) = 15
restriction time (milliseconds) = 124
projection time (milliseconds) = 0
optimizer estimated row count: 19200.45
optimizer estimated cost: 82959.49
Source result set:
Table Scan ResultSet for TBLSEARCHDICTIONARY at read
committed
isolation level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 40806
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4001
close time (milliseconds) = 15
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 2, 3}
Number of columns fetched=3
Number of pages visited=2978
Number of rows qualified=40806
Number of rows visited=256001
Scan type=heap
start position:
null stop position:
null qualifiers:
Column[0][0] Id: 2
Operator: <
Ordered nulls: false
Unknown return value: true
Negate comparison result: true
Column[0][1] Id: 2
Operator: <=
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 19200.45
optimizer estimated cost: 82959.49
----------
Thanks in advance for any help!
Kind regards,
Kevin Hore