Hi Satheesh

Satheesh Bandaram wrote:
Hi Kevin,

Kevin Hore wrote:


i) Does anyone have any plans to fix this problem?


Can you file an enhancement request for this? I think Derby could
improve it's handling of OR/IN clauses. Many databases don't optimize OR
clauses as much as possible, though some do better than others. It would
be great if Derby could internally process this as two different scans
(one for 'CONTACT' and another for 'ADD') and then combine the results.
Some databases can do this. However, the workaround suggested by Jeff L.
does this, though you have to rewrite the query.
I've commented on the re-write solution elsewhere. Regarding an
enhancement, I think DERBY-47 pretty much covers the problem, but I'll
add a comment to that describing my circumstances.


Satheesh


ii) In the meantime, are there any work-arounds? I’d appreciate any
suggestions that would decrease the execution time of my second query
below (the one with with two search terms). Likewise, any general
strategies for avoiding this problem with IN clauses would be
appreciated.


----PROBLEM DESCRIPTION----

Consider the table:

CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);

This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:

CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
(ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);

The table contains about 260,000 rows.

The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by the
ObjectId column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;

On my machine this will usually complete in an acceptable time of
around 200ms.

Now consider the following query which adds a second search term on
the same column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;

This second query usually takes around 10000ms on my machine. My
understanding from the Derby optimizer docs and DERBY-47 is that this
is because Derby is re-writing the query along the following lines,
and then choosing to do a table scan:

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
 Word IN ('CONTACT', 'ADD')
 AND Word >= 'ADD'
 AND Word <= 'CONTACT'
GROUP BY ObjectId;

The plan for the first query indicates that the tblSearchDictionaryWord
index is used to perform an index scan. However, the plan for the second
query indicates that the majority of the additional time is taken
performing a table scan over the entire table, instead of making use of
the indexes available. Our application uses IN quite frequently, so
this optimizer behaviour would seem to present a significant problem.

---QUERY PLAN FOR FIRST QUERY----

Statement Name:
   null
Statement Text:
   SELECT
   ObjectId,
   SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
       Word = 'CONTACT'
GROUP BY
   ObjectId

Parse Time: 0
Bind Time: 0
Optimize Time: 16
Generate Time: 0
Compile Time: 16
Execute Time: 0
Begin Compilation Timestamp : 2005-11-11 12:28:52.765
End Compilation Timestamp : 2005-11-11 12:28:52.781
Begin Execution Timestamp : 2005-11-11 13:08:15.406
End Execution Timestamp : 2005-11-11 13:08:15.406
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 93
Rows filtered = 0
restriction = false
projection = true
   constructor time (milliseconds) = 0
   open time (milliseconds) = 0
   next time (milliseconds) = 0
   close time (milliseconds) = 0
   restriction time (milliseconds) = 0
   projection time (milliseconds) = 0
   optimizer estimated row count:            1.00
   optimizer estimated cost:          226.00

Source result set:
   Grouped Aggregate ResultSet:
   Number of opens = 1
   Rows input = 113
   Has distinct aggregate = false
   In sorted order = false
   Sort information:
       Number of rows input=113
       Number of rows output=93
       Sort type=internal
       constructor time (milliseconds) = 0
       open time (milliseconds) = 0
       next time (milliseconds) = 0
       close time (milliseconds) = 0
       optimizer estimated row count:            1.00
       optimizer estimated cost:          226.00

   Source result set:
       Project-Restrict ResultSet (4):
       Number of opens = 1
       Rows seen = 113
       Rows filtered = 0
       restriction = false
       projection = true
           constructor time (milliseconds) = 0
           open time (milliseconds) = 0
           next time (milliseconds) = 0
           close time (milliseconds) = 0
           restriction time (milliseconds) = 0
           projection time (milliseconds) = 0
           optimizer estimated row count:          118.00
           optimizer estimated cost:          226.00

       Source result set:
           Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
           Number of opens = 1
           Rows seen = 113
           Columns accessed from heap = {0, 3}
               constructor time (milliseconds) = 0
               open time (milliseconds) = 0
               next time (milliseconds) = 0
               close time (milliseconds) = 0
               optimizer estimated row count:          118.00
               optimizer estimated cost:          226.00

               Index Scan ResultSet for TBLSEARCHDICTIONARY using index
TBLSEARCHDICTIONARYWORD at read committed isolation level using share
row locking chosen by the optimizer
               Number of opens = 1
               Rows seen = 113
               Rows filtered = 0
               Fetch Size = 1
                   constructor time (milliseconds) = 0
                   open time (milliseconds) = 0
                   next time (milliseconds) = 0
                   close time (milliseconds) = 0
                   next time in milliseconds/row = 0

               scan information:
                   Bit set of columns fetched=All
                   Number of columns fetched=2
                   Number of deleted rows visited=0
                   Number of pages visited=4
                   Number of rows qualified=113
                   Number of rows visited=114
                   Scan type=btree
                   Tree height=3
                   start position:
   >= on first 1 column(s).
   Ordered null semantics on the following columns:
0
                   stop position:
   > on first 1 column(s).
   Ordered null semantics on the following columns:
0
                   qualifiers:
None
                   optimizer estimated row count:          118.00
                   optimizer estimated cost:          226.00


---QUERY PLAN FOR SECOND QUERY----

Statement Name:
   null
Statement Text:
   SELECT
   ObjectId,
   SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
       Word = 'CONTACT' OR  Word = 'ADD'
GROUP BY
   ObjectId

Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 15
Compile Time: 15
Execute Time: 4250
Begin Compilation Timestamp : 2005-11-11 13:16:17.578
End Compilation Timestamp : 2005-11-11 13:16:17.593
Begin Execution Timestamp : 2005-11-11 13:16:17.593
End Execution Timestamp : 2005-11-11 13:16:27.437
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 100
Rows filtered = 0
restriction = false
projection = true
   constructor time (milliseconds) = 0
   open time (milliseconds) = 4250
   next time (milliseconds) = 0
   close time (milliseconds) = 0
   restriction time (milliseconds) = 0
   projection time (milliseconds) = 0
   optimizer estimated row count:            1.00
   optimizer estimated cost:        82959.49

Source result set:
   Grouped Aggregate ResultSet:
   Number of opens = 1
   Rows input = 712
   Has distinct aggregate = false
   In sorted order = false
   Sort information:
       Number of rows input=712
       Number of rows output=593
       Sort type=internal
       constructor time (milliseconds) = 0
       open time (milliseconds) = 4250
       next time (milliseconds) = 0
       close time (milliseconds) = 0
       optimizer estimated row count:            1.00
       optimizer estimated cost:        82959.49

   Source result set:
       Project-Restrict ResultSet (4):
       Number of opens = 1
       Rows seen = 712
       Rows filtered = 0
       restriction = false
       projection = true
           constructor time (milliseconds) = 0
           open time (milliseconds) = 0
           next time (milliseconds) = 4219
           close time (milliseconds) = 15
           restriction time (milliseconds) = 0
           projection time (milliseconds) = 0
           optimizer estimated row count:        19200.45
           optimizer estimated cost:        82959.49

       Source result set:
           Project-Restrict ResultSet (3):
           Number of opens = 1
           Rows seen = 40806
           Rows filtered = 40094
           restriction = true
           projection = false
               constructor time (milliseconds) = 0
               open time (milliseconds) = 0
               next time (milliseconds) = 4219
               close time (milliseconds) = 15
               restriction time (milliseconds) = 124
               projection time (milliseconds) = 0
               optimizer estimated row count:        19200.45
               optimizer estimated cost:        82959.49

           Source result set:
               Table Scan ResultSet for TBLSEARCHDICTIONARY at read
committed
isolation level using share row locking chosen by the optimizer
               Number of opens = 1
               Rows seen = 40806
               Rows filtered = 0
               Fetch Size = 1
                   constructor time (milliseconds) = 0
                   open time (milliseconds) = 0
                   next time (milliseconds) = 4001
                   close time (milliseconds) = 15
                   next time in milliseconds/row = 0

               scan information:
                   Bit set of columns fetched={0, 2, 3}
                   Number of columns fetched=3
                   Number of pages visited=2978
                   Number of rows qualified=40806
                   Number of rows visited=256001
                   Scan type=heap
                   start position:
null                    stop position:
null                    qualifiers:
Column[0][0] Id: 2
Operator: <
Ordered nulls: false
Unknown return value: true
Negate comparison result: true
Column[0][1] Id: 2
Operator: <=
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

                   optimizer estimated row count:        19200.45
                   optimizer estimated cost:        82959.49

----------

Thanks in advance for any help!

Kind regards,


Kevin Hore








Reply via email to