Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-17 Thread RP McMurphy
>When you say the data changes, does the shape actually change? Because all that
>matters to analyze is the shape of the data, not the data itself.

Upon further analysis it appears that the data "shape" is different in 
different periods within the table. That is, some sections have the inverse 
shape to other sections. So it looked like query times would change over time 
but actually they are changing because different portions of the table are 
being accessed. Is it possible to tell sqlite to analyze different sections and 
keep separate a record for each?

>Unlike me, some other people on here are extremely knowledgeable on SQL and 
>specifically the inner
>workings of SQLite - Tell us the full story, we might save you hundreds of 
>hours.

I will try to take advantage of that. Let me see what I am able to say about 
our system and hopefully come up with some concise and pertinent questions.

RP

PS: Some administravia; Does anyone know of a way to reduce the posting delay 
for this list? Is it always like this? Or is it some problem with the gmane 
site in general?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-15 Thread RP McMurphy
I am resending this message below from 3 days ago because it never made it to 
the list.

RP

PS: Messages seem to take a long time to go through the gmane system, at least 
half a day and sometimes more in my experience so far.


On Tue, 11/11/14, RP McMurphy <rpm0...@yahoo.com> wrote:

 Subject: Re: Discrepancy with indexing and WHERE clause with AND/OR
 To: sqlite-users@sqlite.org
 Date: Tuesday, November 11, 2014, 6:31 PM
 
 > If you can provide
 any examples where ANALYZE makes a query slower, I
 suspect the developer team would like
 > to see them.
 > 
 
 After we run analyze and then
 let the process run for a while the DB
 contents change - and it can change quite
 considerably depending
 upon what is
 happening. I suspect that the analyze data gets stale, but
 I
 don't know how to track such things in
 sqlite. Anyhow we can't keep running
 analyze every few minutes because it takes a
 long time to run with our DB
 and it appears
 to block all other actions until it is done.
 
 A this point we are
 considering writing VDBE code directly and bypassing the
 parser. Has anyone else done this? Is it going
 to be a huge ugly
 can-of-worms if we do
 that?
 
 RP
 
  
 
 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Null rowid when using CTEs

2014-11-14 Thread RP McMurphy
When using CTEs the rowid is null. The following returns "--NULL--"

.null --NULL--
with c(x) as (select 1)
select rowid from c;

RP
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-11 Thread RP McMurphy

> <http://www.sqlite.org/lang_analyze.html>

Okay, for my simplified example analyze does improve the times. But for our 
application this does not help, and it also harmed a couple of other queries. 
I'll have to see if I can make a better example schema showing the problem 
because I can't upload the entire 900MB database.

RP


On Sun, 11/9/14, RP McMurphy <rpm0...@yahoo.com> wrote:

 Subject: Discrepancy with indexing and WHERE clause with AND/OR
 To: sqlite-users@sqlite.org
 Date: Sunday, November 9, 2014, 2:35 AM
 
 If there is a large table and we need
 to select a subset of values using a WHERE clause with an
 AND/OR construct sqlite has trouble finding the answer in a
 reasonable time. Breaking the queries down into separate
 SELECT statements speeds up the process exponentially.
 
 For example the following takes a few seconds to return the
 answer 1334. Note that the index (w) is a "low quality"
 index with the arguments in the wrong order. The reason for
 this is explained further down:
 
     with recursive cnt(x) as (select 1 union
 all select x+1 from cnt limit 1000)
     insert into v select x % 3,x from cnt;
 
     create index w on v(z,y);
     select count(*) from v
 where    z = 0 and
            
     (    y between 100 and
 1001000 
            
     or    y between 200
 and 2001000
            
     or    y between 300
 and 3001000
            
     or    y between 400
 and 4001000);
 
 The reason the the "low quality" index is because this data
 is also accessed in a different manner. Namely like this:
 
     select count(*) from v group by z;
 
 Both of the above queries each take about 3 seconds to run.
 I don't think the second query can be made faster, but the
 first query can certainly be much faster even with the "low
 quality" index. Thus:
 
     select 
     (select count(*) from v where z = 0 and y
 between 100 and 1001000) +
     (select count(*) from v where z = 0 and y
 between 200 and 2001000) +
     (select count(*) from v where z = 0 and y
 between 300 and 3001000) +
     (select count(*) from v where z = 0 and y
 between 400 and 4001000);
 
 Now the query returns the result 1334 almost immediately.
 The only difference is that the WHERE clause has been
 manually flattened and broken into separate SELECT
 portions.
 
 When we change the index to "high quality" (u) and put the
 arguments in the other order.
 
     drop index w;
     create index u on v(y,z);
 
 And rerun the query:
 
     select count(*) from v
 where    z = 0 and
            
     (    y between 100 and
 1001000 
            
     or    y between 200
 and 2001000
            
     or    y between 300
 and 3001000
            
     or    y between 400
 and 4001000);
 
 The answer 1334 is returned almost immediately. And
 flattening this query gives no advantage. But now the second
 type of query runs terribly slow because the index is very
 poor for this type of query:
 
     select count(*) from v group by z;
 
 And takes more then 30 seconds to finish.
 
 Trying to make two indices in the hope that sqlite will find
 the optimal one by itself gives back result times the same
 as if only the w index is present:
 
     create index w on v(z,y);
     create index u on v(y,z);
 
 Both queries now take a few seconds to run. It is as if the
 u index does not exist.
 
 In our application we have only created the w index (since
 the u index is awful for the grouping query) and manually
 generate the WHERE/AND/OR flattened queries in a loop. This
 solution is unsatisfying to us and I think we must be doing
 something wrong. Is there a way we can make the w index work
 with both queries and not have to run external loops to
 flatten all the WHERE clauses?
 
 RP
 
 PS: Below is the text in one unit that can be copied and
 pasted into a shell session running sqlite3.exe:
 
 create table times(idx,j);
 create table v(z,y);
 
 with recursive cnt(x) as (select 1 union all select x+1 from
 cnt limit 1000)
 insert into v select x % 3,x from cnt;
 
 create index w on v(z,y);
 select 'With index (z,y)';
 insert into times select 0,julianday('now');
 select 
 (select count(*) from v where z = 0 and y between 100
 and 1001000) +
 (select count(*) from v where z = 0 and y between 200
 and 2001000) +
 (select count(*) from v where z = 0 and y between 300
 and 3001000) +
 (select count(*) from v where z = 0 and y between 400
 and 4001000);
 insert into times select 1,julianday('now');
 select strftime('%f',(select j from times where
 idx=1)-(select j from times where idx=0));
 select count(*) from v where    z = 0 and
            
 (    y between 100 and 1001000 
            
 or    y between 200 and 2001000
            
 or    y between 300 and 3001000
            
 or    y between 400 and 4001000);
 insert into times select 2,julianday('

[sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-10 Thread RP McMurphy
If there is a large table and we need to select a subset of values using a 
WHERE clause with an AND/OR construct sqlite has trouble finding the answer in 
a reasonable time. Breaking the queries down into separate SELECT statements 
speeds up the process exponentially.

For example the following takes a few seconds to return the answer 1334. Note 
that the index (w) is a "low quality" index with the arguments in the wrong 
order. The reason for this is explained further down:

with recursive cnt(x) as (select 1 union all select x+1 from cnt limit 
1000)
insert into v select x % 3,x from cnt;

create index w on v(z,y);
select count(*) from v wherez = 0 and
(   y between 100 and 1001000 
or  y between 200 and 2001000
or  y between 300 and 3001000
or  y between 400 and 4001000);

The reason the the "low quality" index is because this data is also accessed in 
a different manner. Namely like this:

select count(*) from v group by z;

Both of the above queries each take about 3 seconds to run. I don't think the 
second query can be made faster, but the first query can certainly be much 
faster even with the "low quality" index. Thus:

select 
(select count(*) from v where z = 0 and y between 100 and 1001000) +
(select count(*) from v where z = 0 and y between 200 and 2001000) +
(select count(*) from v where z = 0 and y between 300 and 3001000) +
(select count(*) from v where z = 0 and y between 400 and 4001000);

Now the query returns the result 1334 almost immediately. The only difference 
is that the WHERE clause has been manually flattened and broken into separate 
SELECT portions.

When we change the index to "high quality" (u) and put the arguments in the 
other order.

drop index w;
create index u on v(y,z);

And rerun the query:

select count(*) from v wherez = 0 and
(   y between 100 and 1001000 
or  y between 200 and 2001000
or  y between 300 and 3001000
or  y between 400 and 4001000);

The answer 1334 is returned almost immediately. And flattening this query gives 
no advantage. But now the second type of query runs terribly slow because the 
index is very poor for this type of query:

select count(*) from v group by z;

And takes more then 30 seconds to finish.

Trying to make two indices in the hope that sqlite will find the optimal one by 
itself gives back result times the same as if only the w index is present:

create index w on v(z,y);
create index u on v(y,z);

Both queries now take a few seconds to run. It is as if the u index does not 
exist.

In our application we have only created the w index (since the u index is awful 
for the grouping query) and manually generate the WHERE/AND/OR flattened 
queries in a loop. This solution is unsatisfying to us and I think we must be 
doing something wrong. Is there a way we can make the w index work with both 
queries and not have to run external loops to flatten all the WHERE clauses?

RP

PS: Below is the text in one unit that can be copied and pasted into a shell 
session running sqlite3.exe:

create table times(idx,j);
create table v(z,y);

with recursive cnt(x) as (select 1 union all select x+1 from cnt limit 1000)
insert into v select x % 3,x from cnt;

create index w on v(z,y);
select 'With index (z,y)';
insert into times select 0,julianday('now');
select 
(select count(*) from v where z = 0 and y between 100 and 1001000) +
(select count(*) from v where z = 0 and y between 200 and 2001000) +
(select count(*) from v where z = 0 and y between 300 and 3001000) +
(select count(*) from v where z = 0 and y between 400 and 4001000);
insert into times select 1,julianday('now');
select strftime('%f',(select j from times where idx=1)-(select j from times 
where idx=0));
select count(*) from v wherez = 0 and
(   y between 100 and 1001000 
or  y between 200 and 2001000
or  y between 300 and 3001000
or  y between 400 and 4001000);
insert into times select 2,julianday('now');
select strftime('%f',(select j from times where idx=2)-(select j from times 
where idx=1));
select count(*) from v group by z;
insert into times select 3,julianday('now');
select strftime('%f',(select j from times where idx=3)-(select j from times 
where idx=2));

drop index w;
delete from times;

create index u on v(y,z);
select 'With index (y,z)';
insert into times select 0,julianday('now');
select 
(select count(*) from v where z = 0 and y between 100 and 

Re: [sqlite] sqlite3.8.7 crashes when creating an index with WAL mode disabled

2014-11-05 Thread RP McMurphy

>Perhaps this could be caught at compile time.  The process that creates
>the amalgamation could add a preprocessor macro defining e.g.
>_SQLITE_AMALGAMATION_.  The SQLITE_OMIT_* macros could test for that
>definition and #pragma error if found.  

I think there is a much easier solution. Since those four API bindings are 
needed by more than just the WAL code then simply remove the four checks and 
have the APIs always bind.

This appears to be a simple oversight about the usage of the APIs. They are 
used by code other than WAL. Specifically the indexing code also uses the file 
mapping APIs.

RP
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite3.8.7 crashes when creating an index with WAL mode disabled

2014-11-04 Thread RP McMurphy
Consider the following build cmd line:

gcc -o sqlite3.exe -DSQLITE_OMIT_WAL sqlite3.c shell.c

The build succeeds when using MinGW on Windows.

Now start the built exe with a memory database:

sqlite3 :memory:

And execute the following query:

create table v(y);
with recursive cnt(x) as (select 1 union all select x+1 from cnt limit 
10)
insert into v select * from cnt;
create index w on v(y);

And watch as it crashes when creating the index.

I tracked the problem down to the following missing library bindings:

CreateFileMappingA
CreateFileMappingW
MapViewOfFile
UnmapViewOfFile

All four of those library bindings are disabled by the existence of 
SQLITE_OMIT_WAL. sqlite tries to execute code at address 0 when the table to 
larger than some value. If I use 96255 instead of 10 in the above query 
there is no crash. And increasing by one to 96256 the crash occurs.

RP
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users