Using Richard's suggestion of changing the where clause of my query
to

where +utime >= 1146441600 and +utime < 114912000

did force sqlite to use the index that gave better performance.

However, I'm seeing some strange behavior that I'm hoping someone can
shed some light on.

With the time period mentioned below, the data set is about 2.5 million
rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
the query time was reduced from 13.5 minutes to 26 seconds (not great,
but at least the browser won't timeout waiting for a response).

However, with a different time period, I have a much smaller data set of
about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
utime), the query still takes about 20 seconds.  However, if the primary
index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
Unfortunately at query time I have no idea of knowing how much data is
going to have to be traversed, so the idea of modifying the query to force
the use of different indexes is not possible.  Can anyone explain why the
performance is so poor with the smaller data set and the "optimal" index.
Any suggestions on a workaround so that optimal performance can be achieved
with all data set sizes?

Thanks for your help,

Steve

Steve Green wrote:

Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:

Hi,

I have the following schema

CREATE TABLE data(
    utime int4,
    r_id int2,
    u_id int4,
    data1 int8,
    data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 1149120000
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index. To test
the index, I was forced to drop the primary index (which I can't do in my
production environment). After dropping the primary index, an explain query
plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance improvement over the unique index, although sqlite's query planner chose these indexes over
the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index without having to remove my unique index? Perhaps this would involve rewritting the query, but
I'm at a loss as to how that could be done.

Thanks for your time,

Steve



--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.

Reply via email to