Re: [sqlite] Re: Query performance issues - index selection

2006-06-05 Thread Steve Green

Joe,

Thanks for the info...

Unfortunately, we don't have the option of moving back to v3.2.1.  However,
I'm curious as well to see the difference in performance, so when (if?) I
get my current performance issues under control, I'll run the test and
post the results.

Steve

Joe Wilson wrote:

If you test against 3.2.1, just use your original where clause 
and original index pk_data:


 where utime >= 1146441600 and utime < 114912

as I don't know what effect the unary +'s would have on it.

--- Joe Wilson <[EMAIL PROTECTED]> wrote:


For my databases, GROUP BY is slower in recent SQLite releases
because my queries are not able to use indexed columns for 
GROUP BY items by design:


http://www.mail-archive.com/sqlite-users%40sqlite.org/msg15563.html

I'm curious what sort of timings you'd get with SQLite 3.2.1.
http://sqlite.org/sqlite-3.2.1.tar.gz

To convert to the older database format:

sqlite334 334.db .dump | sqlite321 321.db

And then perform your timings with both SQLite versions against 
the 321.db file to level the playing field.

Just having an index on utime should be optimal for SQLite 3.2.1.

--- Steve Green <[EMAIL PROTECTED]> wrote:


Using Richard's suggestion of changing the where clause of my query
to

where +utime >= 1146441600 and +utime < 114912000

did force sqlite to use the index that gave better performance.

However, I'm seeing some strange behavior that I'm hoping someone can
shed some light on.

With the time period mentioned below, the data set is about 2.5 million
rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
the query time was reduced from 13.5 minutes to 26 seconds (not great,
but at least the browser won't timeout waiting for a response).

However, with a different time period, I have a much smaller data set of
about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
utime), the query still takes about 20 seconds.  However, if the primary
index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
Unfortunately at query time I have no idea of knowing how much data is
going to have to be traversed, so the idea of modifying the query to force
the use of different indexes is not possible.  Can anyone explain why the
performance is so poor with the smaller data set and the "optimal" index.
Any suggestions on a workaround so that optimal performance can be achieved
with all data set sizes?

Thanks for your help,

Steve

Steve Green wrote:



Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:



Hi,

I have the following schema

CREATE TABLE data(
   utime int4,
   r_id int2,
   u_id int4,
   data1 int8,
   data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  
To test

the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an 
explain query

plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance 
improvement
over the unique index, although sqlite's query planner chose these 
indexes over

the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index 
without having
to remove my unique index?  Perhaps this would involve rewritting the 
query, but

I'm at a loss as to how that could be done.

Thanks for your time,

Steve




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any informat

[sqlite] Re: Query performance issues - index selection

2006-06-02 Thread Steve Green

Using Richard's suggestion of changing the where clause of my query
to

where +utime >= 1146441600 and +utime < 114912000

did force sqlite to use the index that gave better performance.

However, I'm seeing some strange behavior that I'm hoping someone can
shed some light on.

With the time period mentioned below, the data set is about 2.5 million
rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
the query time was reduced from 13.5 minutes to 26 seconds (not great,
but at least the browser won't timeout waiting for a response).

However, with a different time period, I have a much smaller data set of
about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
utime), the query still takes about 20 seconds.  However, if the primary
index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
Unfortunately at query time I have no idea of knowing how much data is
going to have to be traversed, so the idea of modifying the query to force
the use of different indexes is not possible.  Can anyone explain why the
performance is so poor with the smaller data set and the "optimal" index.
Any suggestions on a workaround so that optimal performance can be achieved
with all data set sizes?

Thanks for your help,

Steve

Steve Green wrote:


Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:


Hi,

I have the following schema

CREATE TABLE data(
utime int4,
r_id int2,
u_id int4,
data1 int8,
data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  
To test

the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an 
explain query

plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance 
improvement
over the unique index, although sqlite's query planner chose these 
indexes over

the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index 
without having
to remove my unique index?  Perhaps this would involve rewritting the 
query, but

I'm at a loss as to how that could be done.

Thanks for your time,

Steve





--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Query performance issues - index selection

2006-06-02 Thread Steve Green

Kurt,

Thanks, the single primary index you suggested does give us the
desired results.

Steve

Kurt Welgehausen wrote:

Steve Green <[EMAIL PROTECTED]> wrote:



CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
...
CREATE INDEX ix_data_ut on data( u_id, utime );



Wouldn't a unique index on (u_id, utime, r_id) get you
the pk constraint and better performance on your query
with no other index to confuse things?

Regards


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Query performance issues - index selection

2006-06-01 Thread Steve Green

Richard,

Thanks for the reply.  I did run ANALYZE and that didn't help.  However,
the unary + operator in front of the two utime terms did cause the optimal
index to be used... Can you explain why this works?

I also realized that if I change the "where" clause to something like
where u_id >= 0 and utime >= ..., it will cuase the optimal index to be
used (however, if we every started using negative u_id values, it would
break).

What is the best way to send you sample data (and what is the maximum
file size that you'd like)?

Steve

[EMAIL PROTECTED] wrote:


Steve Green <[EMAIL PROTECTED]> wrote:
[...]


select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10



[...]


So, is there a way that I can get sqlite to use the optimal index
without having to remove my unique index? 



Have you run ANALYZE.  That might fix it.

If not, try adding a unary + in front of the two utime
terms in the WHERE clause:

WHERE +utime >= 1145441600 AND +utime < 114912

If you can send me some sample data and queries, that would
be great.  It will help me to improve the optimizer so that
these kinds of things come up less often in the future.
--
D. Richard Hipp   <[EMAIL PROTECTED]>


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


[sqlite] Re: Query performance issues - index selection

2006-06-01 Thread Steve Green

Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
v7.3

Steve

Steve Green wrote:


Hi,

I have the following schema

CREATE TABLE data(
utime int4,
r_id int2,
u_id int4,
data1 int8,
data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  To 
test

the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an explain 
query

plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance 
improvement
over the unique index, although sqlite's query planner chose these 
indexes over

the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index 
without having
to remove my unique index?  Perhaps this would involve rewritting the 
query, but

I'm at a loss as to how that could be done.

Thanks for your time,

Steve


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


[sqlite] Query performance issues - index selection

2006-06-01 Thread Steve Green

Hi,

I have the following schema

CREATE TABLE data(
utime int4,
r_id int2,
u_id int4,
data1 int8,
data2 int8
);

Each row is uniquely defined by utime, r_id, and u_id, so I have the
following index

CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );

This index is also needed because at update time, I use "insert or
replace" to update the data table.

The type of query that I'm attempting to perform is similar to

select u_id, sum( data1 ), sum( data2 )
from data where utime >= 1146441600 and utime < 114912
group by u_id
order by sum( data1 ) desc
limit 10

My current table has about 2.5 million rows and about 86,000 distinct
u_id values in the time period selected, and the query takes about 13.5
minutes.

Performing an explain query plan reveals

0|0|TABLE data WITH INDEX pk_data

so the primary index is being used...

Based on some past experiences, I added the following index to the table

CREATE INDEX ix_data_ut on data( u_id, utime );

Note that the utime is the last parameter in the index.  With the primary
index in place, I was not able to convince sqlite to use this index.  To test
the index, I was forced to drop the primary index (which I can't do in my
production environment).   After dropping the primary index, an explain query
plan revealed

0|0|TABLE data WITH INDEX ix_data_ut

and the query ran in 26 seconds...

Subsequent tests using the following indexes provided no performance improvement
over the unique index, although sqlite's query planner chose these indexes over
the unique index

CREATE INDEX ix_data_tu on data( utime, u_id );
CREATE INDEX ix_data_t on data( utime );

So, is there a way that I can get sqlite to use the optimal index without having
to remove my unique index?  Perhaps this would involve rewritting the query, but
I'm at a loss as to how that could be done.

Thanks for your time,

Steve
--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.



Re: [sqlite] DBD::SQLite SQLite Ver 3.2.7

2006-03-15 Thread Steve Green

In order to get this to work, modify DBD::SQLite dbdimp.c so that the 3rd
parameter of each call to sqlite3_prepare() is -1 instead of 0.

Steve

Chris Werner wrote:


Hello,

The current release of SQLite, 3.3.4 does not seem to be compatible with the
perl DBD driver. The embedded SQLite package works quite well, But I would
like to have the full SQLite package installed [so I can use sqlite3]. I do
not seem to be able to locate the source archives, I would be looking for
Ver 3.2.7: sqlite-3.2.7.tar.gz

I understand that the problem is a know bug... any feedback on whether a
change should be made in SQLite or the CPAN DBD package?

Thanks,
Christian Werner



--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Query parsing differences between v3.2.x and v3.3.x

2006-03-09 Thread Steve Green

Interesting... I googled cross join and got the following definition:

"A cross join (or Cartesian Product join) will return a result table where
 each row from the first table is combined with each row from the second
 table. The number of rows in the result table is the product of the number
 of rows in each table. If the tables involved are large, this join can
 take a very long time."

So this isn't SQLite's definition of a cross join?


The schema is follows:

table1.schema

ATTACH DATABASE './table1.db' AS t1 ;

CREATE TABLE t1.table1
(
i_id INT4,
b_id INT4,
d_id INT4,
c_id INT2,
data_in REAL,
data_out REAL
);
CREATE INDEX t1.ix_table1_b_id ON table1( b_id );

DETACH DATABASE t1 ;

table2.schema

ATTACH DATABASE './table2.db' AS t2 ;

CREATE TABLE t2.table2
(
d_id INT4 PRIMARY KEY,
r_id INT2,
m_id INT2,
i TEXT,
ct TEXT,
cc TEXT,
type TEXT,
notes TEXT
);

DETACH DATABASE t2 ;

The actual query is

selectt1.b_id, t1.c_id, t2.r_id, t2.m_id,
  sum( t1.data_in ) as data_in,
  sum( t1.data_out ) as data_out
from  t1
join  t2
ont2.d_id = t1.d_id and t1.b_id >= 100 and t1.b_id < 200
group by  t1.b_id, t1.c_id, t2.m_id, t2.r_id

Is this enough information?

Steve

Joe Wilson wrote:

CROSS JOIN is an SQLite-specific thing to disable 
table join optimization.


Please post the schema and indexes of yours tables 
so that this bug may be corrected when CROSS JOIN is not used.


--- Steve Green <[EMAIL PROTECTED]> wrote:



Interestingly, using

from   t1
cross join t2

fixes the problem; using this, causes indices from both tables to be used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead0   4
38SetNumColumns   0   7
39Integer 2   0
40OpenRead3   6   keyinfo(1,BINARY)
41Integer 0   0
42OpenRead1   2
43SetNumColumns   1   3
44Integer 0   0
45OpenRead4   3   keyinfo(1,BINARY)


However, using

from   t2
cross join t1

does *not* cause both indices to be used.

As well, using

from t1, t2

which is suppossed to be the same as

from   t1
cross join t2

does *not* cause both indices to be used.

This is all rather puzzling; I had also thought that using "join" was equivalent
to "cross join", but this could be ignorance on my behalf...

Steve

Joe Wilson wrote:


Hard to say what's the problem if you don't post the 
schema of the tables and the indexes, and provide some

sample data.

Perhaps there is a bug in the join optimizer.
Try using an explicit CROSS JOIN.

select t1.a, t1.b, t2.c, t2.d
from   t1
cross join t2
where  t1.x = t2.x and t1.a >= 100 and t1.a < 200
group by   t1.a, t1.b, t2.c, t2.d


--- Steve Green <[EMAIL PROTECTED]> wrote:




The performance is exactly the same after running analyze on both tables.

Steve

Joe Wilson wrote:



Run an ANALYZE statement on your database and your 
queries will be fast once again.


Perhaps SQLite should fall back to the old non-optimized 
join behavior when the sqlite_stat1 table is missing.
This would cover 99% of the pre-SQLite3.2.3 legacy databases 
out there where the queries have already been hand-tuned.


--- Steve Green <[EMAIL PROTECTED]> wrote:





Hi,

I currently have a v3.2.0 database that contains two tables that I regularly
query with a join, e.g.,

selectt1.a, t1.b, t2.c, t2.d



from  t1




join  t2
ont1.x = t2.x and t1.a >= 100 and t1.a < 200
group by  t1.a, t1.b, t2.c, t2.d

Table t1 has an index on a
Table t2 has an index on x

I've created a new v3.3.4 database with the same schema and data.  Table t1 has
about 150 million rows and t2 has about 3000 rows.  When I run the query using
v3.2.0, it returns in about 0.1 seconds.  When run using v3.3.4, it takes about
20 seconds.  For completeness, I ran the same query using v3.2.8, v3.3.2, and
v3.3.3.  v3.2.8 performed the same (roughly) as v3.2.0; v3.3.2 and v3.3.3 
performed
the same as v3.3.4 (so, it appears that the change was made going into v3.3.x).

When I ran an explain on the query in both databases, the main difference I
noticed was in the opening of the database tables; in v3.2.0, it appears that
the indices of both tables are being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead1   4
38SetNumColumns   1   7
39Integer 0   0
40OpenRead3   6   keyinfo(1,BINARY)
41Integer 2   0
42OpenRead0   2
43SetNumColumns   0   8
44   

Re: [sqlite] Query parsing differences between v3.2.x and v3.3.x

2006-03-09 Thread Steve Green

Interestingly, using

from   t1
cross join t2

fixes the problem; using this, causes indices from both tables to be used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead0   4
38SetNumColumns   0   7
39Integer 2   0
40OpenRead3   6   keyinfo(1,BINARY)
41Integer 0   0
42OpenRead1   2
43SetNumColumns   1   3
44Integer 0   0
45OpenRead4   3   keyinfo(1,BINARY)


However, using

from   t2
cross join t1

does *not* cause both indices to be used.

As well, using

from t1, t2

which is suppossed to be the same as

from   t1
cross join t2

does *not* cause both indices to be used.

This is all rather puzzling; I had also thought that using "join" was equivalent
to "cross join", but this could be ignorance on my behalf...

Steve

Joe Wilson wrote:

Hard to say what's the problem if you don't post the 
schema of the tables and the indexes, and provide some

sample data.

Perhaps there is a bug in the join optimizer.
Try using an explicit CROSS JOIN.

select t1.a, t1.b, t2.c, t2.d
from   t1
cross join t2
where  t1.x = t2.x and t1.a >= 100 and t1.a < 200
group by   t1.a, t1.b, t2.c, t2.d


--- Steve Green <[EMAIL PROTECTED]> wrote:



The performance is exactly the same after running analyze on both tables.

Steve

Joe Wilson wrote:


Run an ANALYZE statement on your database and your 
queries will be fast once again.


Perhaps SQLite should fall back to the old non-optimized 
join behavior when the sqlite_stat1 table is missing.
This would cover 99% of the pre-SQLite3.2.3 legacy databases 
out there where the queries have already been hand-tuned.


--- Steve Green <[EMAIL PROTECTED]> wrote:




Hi,

I currently have a v3.2.0 database that contains two tables that I regularly
query with a join, e.g.,

selectt1.a, t1.b, t2.c, t2.d



from  t1



join  t2
ont1.x = t2.x and t1.a >= 100 and t1.a < 200
group by  t1.a, t1.b, t2.c, t2.d

Table t1 has an index on a
Table t2 has an index on x

I've created a new v3.3.4 database with the same schema and data.  Table t1 has
about 150 million rows and t2 has about 3000 rows.  When I run the query using
v3.2.0, it returns in about 0.1 seconds.  When run using v3.3.4, it takes about
20 seconds.  For completeness, I ran the same query using v3.2.8, v3.3.2, and
v3.3.3.  v3.2.8 performed the same (roughly) as v3.2.0; v3.3.2 and v3.3.3 
performed
the same as v3.3.4 (so, it appears that the change was made going into v3.3.x).

When I ran an explain on the query in both databases, the main difference I
noticed was in the opening of the database tables; in v3.2.0, it appears that
the indices of both tables are being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead1   4
38SetNumColumns   1   7
39Integer 0   0
40OpenRead3   6   keyinfo(1,BINARY)
41Integer 2   0
42OpenRead0   2
43SetNumColumns   0   8
44Integer 2   0
45OpenRead4   3   keyinfo(1,BINARY)
46Integer 3795633 0


In v3.3.4, it appears that the index of only one of the tables is being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead0   2
38SetNumColumns   0   3
39Integer 0   0
40OpenRead1   4
41SetNumColumns   1   7
42Integer 0   0
43OpenRead3   6   keyinfo(1,BINARY)
44Rewind  0   78
45Integer 3795633 0


It's quite possible that I'm misinterpreting the explain data as I'm fairly new
with it... However, it seems to me that this might explain the huge difference
in performance.

Has anyone else seen this type of performance difference between v3.2.x and 
v3.3.x?
Any ideas on how we can improve the performance using v3.3.x (we'd really like 
to
use the most current version of the database, but the performance difference 
would
be a killer)?

Thanks,

Steve





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any infor

Re: [sqlite] Query parsing differences between v3.2.x and v3.3.x

2006-03-09 Thread Steve Green

The performance is exactly the same after running analyze on both tables.

Steve

Joe Wilson wrote:

Run an ANALYZE statement on your database and your 
queries will be fast once again.


Perhaps SQLite should fall back to the old non-optimized 
join behavior when the sqlite_stat1 table is missing.
This would cover 99% of the pre-SQLite3.2.3 legacy databases 
out there where the queries have already been hand-tuned.


--- Steve Green <[EMAIL PROTECTED]> wrote:



Hi,

I currently have a v3.2.0 database that contains two tables that I regularly
query with a join, e.g.,

selectt1.a, t1.b, t2.c, t2.d
from  t1
join  t2
ont1.x = t2.x and t1.a >= 100 and t1.a < 200
group by  t1.a, t1.b, t2.c, t2.d

Table t1 has an index on a
Table t2 has an index on x

I've created a new v3.3.4 database with the same schema and data.  Table t1 has
about 150 million rows and t2 has about 3000 rows.  When I run the query using
v3.2.0, it returns in about 0.1 seconds.  When run using v3.3.4, it takes about
20 seconds.  For completeness, I ran the same query using v3.2.8, v3.3.2, and
v3.3.3.  v3.2.8 performed the same (roughly) as v3.2.0; v3.3.2 and v3.3.3 
performed
the same as v3.3.4 (so, it appears that the change was made going into v3.3.x).

When I ran an explain on the query in both databases, the main difference I
noticed was in the opening of the database tables; in v3.2.0, it appears that
the indices of both tables are being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead1   4
38SetNumColumns   1   7
39Integer 0   0
40OpenRead3   6   keyinfo(1,BINARY)
41Integer 2   0
42OpenRead0   2
43SetNumColumns   0   8
44Integer 2   0
45OpenRead4   3   keyinfo(1,BINARY)
46Integer 3795633 0


In v3.3.4, it appears that the index of only one of the tables is being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead0   2
38SetNumColumns   0   3
39Integer 0   0
40OpenRead1   4
41SetNumColumns   1   7
42Integer 0   0
43OpenRead3   6   keyinfo(1,BINARY)
44Rewind  0   78
45Integer 3795633 0


It's quite possible that I'm misinterpreting the explain data as I'm fairly new
with it... However, it seems to me that this might explain the huge difference
in performance.

Has anyone else seen this type of performance difference between v3.2.x and 
v3.3.x?
Any ideas on how we can improve the performance using v3.3.x (we'd really like 
to
use the most current version of the database, but the performance difference 
would
be a killer)?

Thanks,

Steve





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


[sqlite] Query parsing differences between v3.2.x and v3.3.x

2006-03-09 Thread Steve Green

Hi,

I currently have a v3.2.0 database that contains two tables that I regularly
query with a join, e.g.,

selectt1.a, t1.b, t2.c, t2.d
from  t1
join  t2
ont1.x = t2.x and t1.a >= 100 and t1.a < 200
group by  t1.a, t1.b, t2.c, t2.d

Table t1 has an index on a
Table t2 has an index on x

I've created a new v3.3.4 database with the same schema and data.  Table t1 has
about 150 million rows and t2 has about 3000 rows.  When I run the query using
v3.2.0, it returns in about 0.1 seconds.  When run using v3.3.4, it takes about
20 seconds.  For completeness, I ran the same query using v3.2.8, v3.3.2, and
v3.3.3.  v3.2.8 performed the same (roughly) as v3.2.0; v3.3.2 and v3.3.3 
performed
the same as v3.3.4 (so, it appears that the change was made going into v3.3.x).

When I ran an explain on the query in both databases, the main difference I
noticed was in the opening of the database tables; in v3.2.0, it appears that
the indices of both tables are being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead1   4
38SetNumColumns   1   7
39Integer 0   0
40OpenRead3   6   keyinfo(1,BINARY)
41Integer 2   0
42OpenRead0   2
43SetNumColumns   0   8
44Integer 2   0
45OpenRead4   3   keyinfo(1,BINARY)
46Integer 3795633 0


In v3.3.4, it appears that the index of only one of the tables is being used...

addr  opcode  p1  p2  p3
  --  --  --  -

37OpenRead0   2
38SetNumColumns   0   3
39Integer 0   0
40OpenRead1   4
41SetNumColumns   1   7
42Integer 0   0
43OpenRead3   6   keyinfo(1,BINARY)
44Rewind  0   78
45Integer 3795633 0


It's quite possible that I'm misinterpreting the explain data as I'm fairly new
with it... However, it seems to me that this might explain the huge difference
in performance.

Has anyone else seen this type of performance difference between v3.2.x and 
v3.3.x?
Any ideas on how we can improve the performance using v3.3.x (we'd really like 
to
use the most current version of the database, but the performance difference 
would
be a killer)?

Thanks,

Steve
--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


Re: [sqlite] Compatability issues with DBD::SQLite and SQLite v3.3.3 and 3.3.4

2006-03-07 Thread Steve Green

Thanks, that seems to have fixed the problem...

Steve

Eric Bohlman wrote:


Nathan Kurz wrote:


On Mon, Mar 06, 2006 at 06:24:13PM -0800, Steve Green wrote:


Hi,

I'm hoping that someone can shed some light on the following issue that
I'm seeing.  When I attempt to create a temp table using DBD::SQLite 
(v1.11)

and either SQLite v3.3.3 or v3.3.4, I get the following error:

DBD::SQLite::db do failed: not an error(21) at dbdimp.c line 398 at 



Same failures here on Linux 2.4.29.  I didn't have time to debug it,
so I decided just to downgrade to a copy of 3.2.7 that I had around,
and things were fine thereafter.  I presume that some recent change to
sqlite is not yet reflected in DBD::SQLite.  'make test' for
DBD::SQLite is another fine test program that fails.



You'll need to go into dbdimp.c and change the two calls to 
sqlite3_prepare() so that the third argument is -1 rather than zero. 
This is due to the change in check-in 3047.


--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.


[sqlite] Compatability issues with DBD::SQLite and SQLite v3.3.3 and 3.3.4

2006-03-06 Thread Steve Green
 were made.

We are using perl 5.6.1 on Linux 2.6.12.6-nfs #1 SMP.

Thanks for any help you can provide,

Steve

--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.