Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread E . Pasma
> 
> Op 17 sep. 2019, om 04:26 heeft Keith Medcalf  het 
> volgende geschreven:
> 
> 
> On Monday, 16 September, 2019 14:22, E.Pasma  wrote:
> 
>> Stop stop stop
> 
> You are right.  What a difference a spelling error makes ... No wonder it 
> took so long as it was doing table scans -- and the optimizer was doing a 
> jolly job in the other cases in dealing with it.
> 
> Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements 
> ... but it will do them with INSERT statements.  Nevertheless, the results 
> are reasonably similar to these obtained with APSW ...
> 
...
>> st 1000
> Creating db and sample keys: 100 rows; 1000 keys
> Method 1: Individual Row  00:00:00.019247
> Method 2: Individual Row (Sorted) 00:00:00.017748
> Method 3: Rows with ExecMany  00:00:00.016084
> Method 3: Rows with ExecMany Sort 00:00:00.015766
> Method 4: Using IN temp   00:00:00.007528
> Method 5: Using IN temp (sorted)  00:00:00.007821
> Method 6: Using IN temp no rowid  00:00:00.007600
> Method 7: Using IN (dynamic)  00:00:00.005317
> Method 8: Using IN (sorted)   00:00:00.004884
> Method 9: Using IN CArray 00:00:00.005081
> Method A: Using IN CArray sorted  00:00:00.005190
..

> Using this code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 100
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()
> tuplize = lambda x: (x,)
> 
> db = apsw.Connection(':memory:')
> #db = sqlite3.connect(':memory:', isolation_level=None)
> 
> print('Creating db and sample keys:', end=' ', flush=True)
> db.executescript('''
> create table x
> (
>id  integer primary key,
>datablob
> );
> insert into x
> with a(x) as (
>select 1
> union all
>select x + 1
>  from a
> where x < %d
> )
> select x, randomblob(30)
>  from a;
> analyze;
> ''' % (datasize,))
> print(db.execute('select count(*) from x').fetchone()[0], 'rows;', end=' ')
> 
> rowset = [i for i in range(datasize)]
> random.shuffle(rowset)
> rowset = rowset[:rows]
> print(len(rowset), 'keys')
> 
> print('Method 1: Individual Row ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in rowset:
>row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in sorted(rowset):
>row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> rowset))):
>pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany Sort', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> sorted(rowset:
>pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 4: Using IN temp  ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 5: Using IN temp (sorted) ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 6: Using IN temp no rowid ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key primary key) without 
> rowid')
> db.executemany('insert or ignore into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 7: Using IN (dynamic) ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> rowset)) + ')'):
>pass
> print(elapsed(st, time.time()))
> 
> print('Method 8: Using IN (sorted)  ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> sorted(rowset))) + ')'):

[sqlite] generate_series can theoretically behave differently in SQLite 3.23.0

2018-04-03 Thread E . Pasma
Hello, below is a case where the generate_series virtual table behaves  
differently with SQLite 3.23. It is a purely artificial case. Does  
anyoone have an opinion about it?


The issue comes from the fact that generate_series does not check if a  
supposed integer input value happens to be NULL. And continues as if  
zero is passed in. This way a WHERE condition like


=

can be true. Regular tables require "IS" instead of "=" here.  
Theoretically the left join reduction prover can now be misleaded. As  
is the case below,


Is this a (obscure) bug in generate_series?

E Pasma

.version
SQLite 3.23.0 2018-03-24 13:24:02  
cf171abe954a5f25262161dd69f2e8cecdbf9446c3f6b298201507dbc743567e

zlib version 1.2.3
gcc-4.0.1 (Apple Inc. build 5465)
.load series
create table t1(x);
create table t2(x,y);
insert into t1 values(1),(2);
insert into t2 values(1,4);
select t1.x, value
from t1
left join t2 using (x)
join generate_series on start=t2.y and stop=5;
1|4
1|5

Same script in earlier version with echo off.

.version
SQLite 3.22.0 2018-01-09 15:28:25  
6b3f4349d7156501f05270d494d6002ee03a0860b6e77b46d17effcab027b753

.echo off
1|4
1|5
2|0
2|1
2|2
2|3
2|4
2|5

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The upcoming 3.23.0 release

2018-03-24 Thread E . Pasma

Dear Richard, the rather complicated script below used to print a tree:

node1|node2|node3|node4|node5
1
1|11|||
1|12|||
1|12|121||
1|13|||

and with yesterday's pre-release snapshot it does not yield any result.
A clue is this depends on the size of the query, not so much on the  
logic.
If you reduce the number of anticipated levels, by leaving out the  
last four lines, the output is alright.

Hope this causes no headache, E. Pasma

.version
SQLite 3.23.0 2018-03-22 12:00:43  
dd568c27b1d7656388ea5b4132cc0265aedd7348d265d8e8c7412b00b28a31aa

zlib version 1.2.3
gcc-4.0.1 (Apple Inc. build 5465)

create table node (node integer primary key, parent integer)
;
insert into node values (1,0),(11,1),(12,1),(13,1),(121,12)
;
create index node_parent on node (parent)
;
create table bit (bit integer primary key)
;
insert into bit values(0),(1)
;
SELECT  node1, node2, node3, node4, node5
/* level 1 is root */
FROM(SELECT node AS node1 FROM node WHERE parent=0)
/* level 2 */
JOIN(SELECT bit AS bit1 FROM bit)
LEFT JOIN (SELECT node AS node2, parent AS parent2 FROM node)
ON  bit1 AND parent2=node1
/* level 3 */
JOIN(SELECT bit AS bit2 FROM bit)
ON  bit2<=CASE WHEN bit1 THEN CASE WHEN node2 THEN 1 ELSE -1 END  
ELSE 0 END

LEFT JOIN (SELECT node AS node3, parent AS parent3 FROM node)
ON  bit2 AND parent3=node2
/* level 4 */
JOIN(SELECT bit AS bit3 FROM bit)
ON  bit3<=CASE WHEN bit2 THEN CASE WHEN node3 THEN 1 ELSE -1 END  
ELSE 0 END

LEFT JOIN (SELECT node AS node4, parent AS parent4 FROM node)
ON  bit3 AND parent4=node3
/* level 5 */
JOIN(SELECT bit AS bit4 FROM bit)
ON  bit4<=CASE WHEN bit3 THEN CASE WHEN node4 THEN 1 ELSE -1 END  
ELSE 0 END

LEFT JOIN (SELECT node AS node5, parent AS parent5 FROM node)
ON  bit4 AND parent5=node4
;

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] View is not flattened when inside an IN sub-query (incl complete sql)

2017-11-16 Thread E . Pasma
Hello, below are two equivalent delete statements. The difference is  
that the second version uses views, actually sub-queries, on the base  
tables. These are simple one to one views that could be flattened out,  
as in

   http://www.sqlite.org/optoverview.html#flattening
The second query plan has a full scan on t2 instead of direct access  
via the existing index sqlite_autoindex_t2_1. The view is apparently  
not flattened. Is it possible changing this? Thanks, E. Pasma


.version
SQLite 3.21.0 2017-10-02 02:52:54  
c9104b59c7ed360291f7f6fc8caae938e9840c77620d598e4096f78183bf807a

create table t1 (a, b, primary key(a,b));
create table t2 (b, c, primary key(b,c));
create table t3 (a, c, primary key(a,c));

.eqp on
delete from t3 where (a,c) in (
select  a,c
fromt1
joint2 using(b)
);
--EQP-- 0,0,0,SEARCH TABLE t3 USING INDEX sqlite_autoindex_t3_1 (a=?  
AND c=?)

--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE t1
--EQP-- 0,1,1,SEARCH TABLE t2 USING COVERING INDEX  
sqlite_autoindex_t2_1 (b=?)


delete from t3 where (a,c) in (
select  a,c
from(select a,b from t1)
join(select b,c from t2) using(b)
);
--EQP-- 0,0,0,SEARCH TABLE t3 USING INDEX sqlite_autoindex_t3_1 (a=?  
AND c=?)

--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 1,0,0,SCAN TABLE t1
--EQP-- 2,0,0,SCAN TABLE t2
--EQP-- 0,0,0,SCAN SUBQUERY 1
--EQP-- 0,1,1,SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (b=?)

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow query, with correlated sub-sub-query

2017-07-07 Thread E . Pasma


Keith, this definitely explains the observed time as it is relative to  
count(a)*count (ab)**2, thus non-linear.

And a correlated sub-query is generally recalculated for each row.
But I do not agree with everything.
In my example it is correlated to the outermost query, and not to the  
sub-query in which it occurs.
Theoretically the optimizer can take this into account and only  
recalculate for each row in the outermost query. And if I'm not  
mistaken Postgress does so. Below is a version modified for pgsql that  
runs fast no matter the number of rows.
Thanks for the suggested change, where the minimum size is computed is  
a sub-query (not sub-sub) and joined to the other sub-query. This is  
so elegant. I still need to compare the timing to David's version and  
use the fastest.


/* sudo -u postgres psql < issue2p.sql */
drop table if exists a
;
drop table if exists ab
;
create table a (a int, primary key (a))
;
create table ab (a int, b int, size int, primary key (a,b))
;
insert into a
with recursive i as (select 1 as i union all select i+1 from i where  
i<3)

select i from i
;
insert into ab
with recursive i as (select 1 as i union all select i+1 from i where  
i<1)

select a, i as b, (a+i)%10 as size from a, i
;
select  a,
(
select  count(*)
fromab
where   a=a.a
and size=(select min(size) from ab where a=a.a)
)
froma
;

Keith Medcalf wrote:
Well of course.  You are aware that a correlated subquery means "for  
each candidate result execute the query"?


So as you have formulated the query it means:

for each row in a
compute the result count which
 for each ab candidate row
 calculate whether it is the minimum

which means that the you have requested that the same result be  
computed many times over.  You have requested exampination of  
count(a) * count(ab) * count(ab) rows.


Instead you should be computing the min(size) for each group of a  
once, and using that value in the correlated subquery


select a.a,
   (
   select count(*)
 from ab
where a == a.a
  and size == b.size
   ) as acount
  from a,
   (
   select a,
  min(size) as size
 from ab
 group by a
) as b
where a.a == b.a;

This will result in scanning count(ab) + count(a) * count(ab) rows.   
Which is significantly less.  On my computer it reduces the  
execution time of the original query you posited from 400 ticks to  
less than 1 tick (ie, from 500 ms to <8 ms)


I do not know if any optimizer can flatten you original query to any  
significant degree.  Some optimizers may arrive at my fixed up query  
because they are capable of doing a hash table lookup on the result  
of the innermost correlate.  SQLite does not do that, and without  
that capability I do not think there is a relational database query  
optimizer on the planet that can help you.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] obscure bug: duplicate rowid after insert or replace

2017-05-01 Thread E . Pasma
Hello, I have a duplicate rowid in a 3.16.2 database and this is  
essentially produced as follows:


CREATE TABLE t(i INTEGER PRIMARY KEY, a TEXT)
;
INSERT INTO t VALUES
(NULL, 'generates row 1')
;
REPLACE INTO t VALUES
(NULL, 'generates row 2'),
(1, 'replaces row 1')
;

It is alright after changing the order in the multiple values. Hope  
I'm not mistaken.Thanks, E. Pasma


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] group_replace

2016-08-11 Thread E . Pasma


11 aug 2016, om 13:10, Anthony Lansbergen:


Hello,

I needed a way to make parameterized texts for logging in my current  
project.
For this purpose I missed a function in sqlite: group_replace, so I  
made an extension.
The function is just like group_concat, an aggregate function, but  
it replaces keys with values in a string instead of concattenating  
strings.


I put the code online, it's publicly available at:

https://github.com/adesys/sqlite3_group_replace_extension

It seems to work fine, but since this is my first extension, can  
someone please take a look at it and check if it is bug free :-)


thanks in advance,
Anthony Lansbergen

Hello, I am not very experienced. I do have an easy test to check  
memory leak. It runs on OS X. For  group_replace i tested the script  
below and that shows increasing memory use.
Two remarks/questions: 1. better use sqlite3_malloc and sqlite3_free  
instead of the C primitives.
2. is it imaginable to obtain the same functionality without a C  
extension, by using plain replace() inside a recursive CTE?

Thanks, E. Pasma

.load ./sqlite3_group_replace_extension
create table kv (k, v, reverse);
insert into kv values ('1', 'one',0),('2','two',0),('3','knock knock  
knock',0);

insert into kv select v,k,1 from kv;
create view v as
with r as  (
select 0 as i, '1 2 3 4 5 6 7 8 9' as s
union all
select i+1, (select group_replace (s, k, v) from kv where reverse=i%2)
from r where i<10
)
select * from r where i>=10-1
;
select * from v;
.system ps -o vsz -o command|grep sqlite3$
select * from v;
.system ps -o vsz -o command|grep sqlite3$
select * from v;
.system ps -o vsz -o command|grep sqlite3$

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] special aggregate feature vs query optimization

2013-01-10 Thread E . Pasma

Hello,

this mail is about the aggregate feature that was unveiled in the  
release log of version 3.17.11, http://www.sqlite.org/releaselog/3_7_11.html 
 :


A query of the form: "SELECT max(x), y FROM table" returns the value  
of y on the same row that contains the maximum x value.


I just want to point to a construction where one would expect this to  
work however it does not. I tried a query that returns only the value  
of y and intuitively wrote:


select y from (select max(x), y from t);

This however no longer returns the value of y corresponding to the  
maximum x.


It looks a consequence of query optimization. The query satisfies all  
conditions for "Query Flattening" in http://www.sqlite.org/optoverview.html 
 . The max(x) column is then eliminated.


Hope this is useful to know for who is using the feature.

EPasma

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite bug ticket - build fails on sun4-solaris-64int 2.10

2010-01-02 Thread E. Pasma
Op 2-jan-2010, om 9:32 heeft Roger Binns het volgende geschreven:

> Darren Duncan wrote:
>> I would like to bring an apparent SQLite bug to the attention of  
>> the SQLite core
>> developers as a ticket, where build fails on sun4-solaris-64int 2.10
>
> You'll find this is not a bug in SQLite.
>
>> cc: Fatal error in /opt/sunstudio12.1/prod/bin/cg
>> cc: Status 139
>> *** Error code 139
>
> That is the compiler crashing (signal 11, SIGSEGV).  This sort of  
> thing
> usually turns out to be an optimiser bug and likely won't happen if  
> you
> disable optimisation, or compile the files individually rather than
> using the amalgamation.  Alternatively use a working compiler like  
> gcc.
>
> Incidentally three of your defines are dodgy:
>
> -DSQLITE_CORE
>
> There is never any need to specify this - all that stuff is handled
> internally.
>
> -DSQLITE_PTR_SZ=4
>
> That name is not used anywhere in the SQLite source I could find.   
> Even
> if it was, implying 4 byte pointers on a 64 bit machine seems  
> dangerous.
>
> -DTHREADSAFE=0
>
> Really?  What is wrong (and less likely to cause the unwary grief)  
> than
> the default of 1?
>
> The other flags seem to be guessed.  There is no need to tell a 64 bit
> system that file offets are 64 bits.  The only 'have' is  
> HAVE_USLEEP but
> the system likely has LOCALTIME_R and GMTIME_R too as well as several
> other header files.
>
> If you do not want to build SQLite using its build system then the
> approach I take is to run SQLite's configure, grab the DEFS = line out
> of the resulting Makefile and generate a .h file with the relevant -D
> turned into #defines.  If you define _HAVE_SQLITE_CONFIG_H then SQLite
> will #include any config.h so you can dump your #defines in there.
>
> Roger


Hello,

just one remark with regards to the thread-safe option: the default  
setting has changed in the configure script of version 3.6.21. It has  
now become 0 and the --enable-theadsafe argument is required to  
configure for threadsafety (strongly recommended for shared cache mode).

Regads, Edzard Pasma

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users