Hi Daniel,

Here are two issues which might drive the optimizer to select a table scan:

1) The optimizer may see the OR as an unoptimizable condition. I think Bryan referred to this.

2) The index does not cover the query, that is, it does not contain all columns needed to evaluate the query.

A variation of your concatenation solution could involve a generated column and a covering index involving that column. This would at least let you preserve the more readable structure of your table for other queries. Something like the following might work:

connect 'jdbc:derby:memory:temp;create=true';

CREATE TABLE Terms (
      id        INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
      field     VARCHAR(255) NOT NULL,
      text      VARCHAR(255) NOT NULL,
      docfreq   INTEGER NOT NULL,
      fieldtext generated always as ( field || ':' || text )
);
CREATE INDEX TermsFieldAndText ON Terms (fieldtext, docfreq);

-- just for show. this does not create enough data to justify
-- using an index
insert into Terms ( field, text, docfreq )
values ( 'body', 'hello', 3 ), ( 'tag', 'home', 2 ), ( 'tag', 'work', 4 ), ( 'zzz', 'abc', 1 );

select substr( fieldtext, 1, locate( ':', fieldtext ) - 1 ), substr( fieldtext, locate( ':', fieldtext ) + 1 ), docfreq
from Terms
where fieldtext > 'tag:home'
fetch first row only;

Hope this helps,
-Rick

Daniel Noll wrote:
Hi all.

I have a table and index like this:

   CREATE TABLE Terms (
       id        INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY NOT NULL,
       field     VARCHAR(255) NOT NULL,
       text      VARCHAR(255) NOT NULL,
       docfreq   INTEGER NOT NULL
   )

   CREATE INDEX TermsFieldAndText ON Terms (field, text)

Here's some example data.  Table is shown in sorted order as it would
be listed in the index above:

| field     | text      | docfreq      |
----------------------------------------
| body      | hello     | 10           |
| tag       | home      | 100          |
| tag       | work      | 80           |

For a given input term (let's say tag:home) I want to find the _next_
entry in the table (so it should return tag:work).  My naive
implementation does this:

   SELECT field, text, docfreq
   FROM Terms
   WHERE (field = ? AND text > ?) OR field > ?
   ORDER BY field ASC, text ASC

This does the right thing as far as the output is concerned, but Derby
does a table scan.  Looking at the query, my impression would be that
in the worse case scenario, Derby should do two index scans.  Or, if
it's smart, one index scan.

I'm aware that I might be able to work around this by having instead a
field with the concatenation of both fields with
CONCAT(field,'\0',text).  But I wanted to check first whether there is
a way I can maintain my relatively understandable table structure
before resorting to hacks.  Basically I just need the data in index
order, starting from a given point... so I figure there is a simple
solution.

Daniel


--
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

Reply via email to