Hi Daniel,
Here are two issues which might drive the optimizer to select a table scan:
1) The optimizer may see the OR as an unoptimizable condition. I think
Bryan referred to this.
2) The index does not cover the query, that is, it does not contain all
columns needed to evaluate the query.
A variation of your concatenation solution could involve a generated
column and a covering index involving that column. This would at least
let you preserve the more readable structure of your table for other
queries. Something like the following might work:
connect 'jdbc:derby:memory:temp;create=true';
CREATE TABLE Terms (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
field VARCHAR(255) NOT NULL,
text VARCHAR(255) NOT NULL,
docfreq INTEGER NOT NULL,
fieldtext generated always as ( field || ':' || text )
);
CREATE INDEX TermsFieldAndText ON Terms (fieldtext, docfreq);
-- just for show. this does not create enough data to justify
-- using an index
insert into Terms ( field, text, docfreq )
values ( 'body', 'hello', 3 ), ( 'tag', 'home', 2 ), ( 'tag', 'work', 4
), ( 'zzz', 'abc', 1 );
select substr( fieldtext, 1, locate( ':', fieldtext ) - 1 ), substr(
fieldtext, locate( ':', fieldtext ) + 1 ), docfreq
from Terms
where fieldtext > 'tag:home'
fetch first row only;
Hope this helps,
-Rick
Daniel Noll wrote:
Hi all.
I have a table and index like this:
CREATE TABLE Terms (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
field VARCHAR(255) NOT NULL,
text VARCHAR(255) NOT NULL,
docfreq INTEGER NOT NULL
)
CREATE INDEX TermsFieldAndText ON Terms (field, text)
Here's some example data. Table is shown in sorted order as it would
be listed in the index above:
| field | text | docfreq |
----------------------------------------
| body | hello | 10 |
| tag | home | 100 |
| tag | work | 80 |
For a given input term (let's say tag:home) I want to find the _next_
entry in the table (so it should return tag:work). My naive
implementation does this:
SELECT field, text, docfreq
FROM Terms
WHERE (field = ? AND text > ?) OR field > ?
ORDER BY field ASC, text ASC
This does the right thing as far as the output is concerned, but Derby
does a table scan. Looking at the query, my impression would be that
in the worse case scenario, Derby should do two index scans. Or, if
it's smart, one index scan.
I'm aware that I might be able to work around this by having instead a
field with the concatenation of both fields with
CONCAT(field,'\0',text). But I wanted to check first whether there is
a way I can maintain my relatively understandable table structure
before resorting to hacks. Basically I just need the data in index
order, starting from a given point... so I figure there is a simple
solution.
Daniel
--
Daniel Noll Forensic and eDiscovery Software
Senior Developer The world's most advanced
Nuix email data analysis
http://nuix.com/ and eDiscovery software