Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-18 Thread E.Pasma
Keith,

The final script produces corresponding results here, only a constant factor 
slower (minimal CPU). The rows per second is useful to summarize the tests for 
various keyset sizes. Below is the average per method with input parameter 5.

meth|rps|note
1|149431|Individual Row 
2|195447|Individual Row (Sorted)
3|167740|Rows ExecuteMany   
3|167740|Rows ExecuteMany Sorted 
4|146503|Using IN temp  
5|149261|Using IN temp (sorted) 
6|137831|Using IN keyset
7|136984|Using IN keyset sorted 
8|170922|Using IN (dynamic)
9|188759|Using IN (sorted)  
A|242761|Using IN CArray
B|274883|Using IN CArray sorted 
C|308547|Using Array JOIN sorted

Hope this is useful to the original poster.

To me SQLite-Python is almost addicting. I learned to use carray now. It 
appears to interface brillantly with Python's array module. Only I have a 
custom carray instead of a custom execute method as you APSW (replaced 
sqlite3_bind_pointer by sqlite3_value_int64, for home use only).

Method C is a JOIN to carray, where the keys are sorted. Order by is not needed 
then.
   select x.* from carray(?,?,'int64') cross join x on id=value
 

Thanks, E. Pasma 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Keith Medcalf

On Monday, 16 September, 2019 14:22, E.Pasma  wrote:

>Stop stop stop

>> create table x
>> (
>>id  integer primay key,
>>datablob
>> );

>I did not see this until searching for the word PRIMARY and not finding
>it. Thus id is not a primary key at all. Probably it is a good habit to
>always add WITHOUT ROWID when there is an explicit primary key. The SQL
>parser would then have reported an error.

>The tests with individual rows must definitely be repeated.

>In my tests the results are closer together:

>Method 1: Retrieve Individual Row 00:00:00.081151 1
>Method 3: using dynamic in00:00:00.060368 9995
>Method 5: using in carray 00:00:00.050884 9995
>Method 5: using carray join   00:00:00.043127 1
>Method 6: Using temp table00:00:00.060808 9995

>(for oarameter = 1)

>I tuned the Python script, using fetchone() in the individual row test.
>And I added a temp table test. In Python this just uses executemany() to
>insert the rowset.

You are right.  What a difference a spelling error makes ... No wonder it took 
so long as it was doing table scans -- and the optimizer was doing a jolly job 
in the other cases in dealing with it.

Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements 
... but it will do them with INSERT statements.  Nevertheless, the results are 
reasonably similar to these obtained with APSW ...

>st 1
Creating db and sample keys: 100 rows; 1 keys
Method 1: Individual Row  00:00:00.000150
Method 2: Individual Row (Sorted) 00:00:00.79
Method 3: Rows with ExecMany  00:00:00.63
Method 3: Rows with ExecMany Sort 00:00:00.61
Method 4: Using IN temp   00:00:00.000323
Method 5: Using IN temp (sorted)  00:00:00.000266
Method 6: Using IN temp no rowid  00:00:00.000284
Method 7: Using IN (dynamic)  00:00:00.78
Method 8: Using IN (sorted)   00:00:00.61
Method 9: Using IN CArray 00:00:00.000195
Method A: Using IN CArray sorted  00:00:00.75

>st 10
Creating db and sample keys: 100 rows; 10 keys
Method 1: Individual Row  00:00:00.000366
Method 2: Individual Row (Sorted) 00:00:00.000309
Method 3: Rows with ExecMany  00:00:00.000293
Method 3: Rows with ExecMany Sort 00:00:00.000305
Method 4: Using IN temp   00:00:00.000418
Method 5: Using IN temp (sorted)  00:00:00.000359
Method 6: Using IN temp no rowid  00:00:00.000430
Method 7: Using IN (dynamic)  00:00:00.000147
Method 8: Using IN (sorted)   00:00:00.000141
Method 9: Using IN CArray 00:00:00.000372
Method A: Using IN CArray sorted  00:00:00.000132

>st 100
Creating db and sample keys: 100 rows; 100 keys
Method 1: Individual Row  00:00:00.002314
Method 2: Individual Row (Sorted) 00:00:00.002144
Method 3: Rows with ExecMany  00:00:00.002010
Method 3: Rows with ExecMany Sort 00:00:00.001807
Method 4: Using IN temp   00:00:00.001042
Method 5: Using IN temp (sorted)  00:00:00.000963
Method 6: Using IN temp no rowid  00:00:00.001004
Method 7: Using IN (dynamic)  00:00:00.000573
Method 8: Using IN (sorted)   00:00:00.000548
Method 9: Using IN CArray 00:00:00.000671
Method A: Using IN CArray sorted  00:00:00.000588

>st 1000
Creating db and sample keys: 100 rows; 1000 keys
Method 1: Individual Row  00:00:00.019247
Method 2: Individual Row (Sorted) 00:00:00.017748
Method 3: Rows with ExecMany  00:00:00.016084
Method 3: Rows with ExecMany Sort 00:00:00.015766
Method 4: Using IN temp   00:00:00.007528
Method 5: Using IN temp (sorted)  00:00:00.007821
Method 6: Using IN temp no rowid  00:00:00.007600
Method 7: Using IN (dynamic)  00:00:00.005317
Method 8: Using IN (sorted)   00:00:00.004884
Method 9: Using IN CArray 00:00:00.005081
Method A: Using IN CArray sorted  00:00:00.005190

>st 1
Creating db and sample keys: 100 rows; 1 keys
Method 1: Individual Row  00:00:00.178937
Method 2: Individual Row (Sorted) 00:00:00.180979
Method 3: Rows with ExecMany  00:00:00.165302
Method 3: Rows with ExecMany Sort 00:00:00.163846
Method 4: Using IN temp   00:00:00.076111
Method 5: Using IN temp (sorted)  00:00:00.076974
Method 6: Using IN temp no rowid  00:00:00.077122
Method 7: Using IN (dynamic)  00:00:00.049132
Method 8: Using IN (sorted)   00:00:00.050656
Method 9: Using IN CArray 00:00:00.052837
Method A: Using IN CArray sorted  00:00:00.050192

>st 10
Creating db and sample keys: 100 rows; 10 keys
Method 1: Individual Row  00:00:01.777458
Method 2: Individual Row (Sorted) 00:00:01.708890
Method 3: Rows with ExecMany  00:00:01.676193
Method 3: Rows with ExecMany Sort 00:00:01.639589
Method 4: Using IN temp   00:00:00.756932
Method 5: Using IN temp (sorted)  00:00:00.742670
Method 6: Using IN temp no rowid  00:00:00.786706
Method 7: Using IN (dynamic)  00:00:00.504242
Method 8: Using IN (sorted)   00:00:00.503634
Method 9: 

Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread E.Pasma
Stop stop stop

> create table x
> (
>id  integer primay key,
>datablob
> );

I did not see this until searching for the word PRIMARY and not finding it. 
Thus id is not a primary key at all. Probably it is a good habit to always add 
WITHOUT ROWID when there is an explicit primary key. The SQL parser would then 
have reported an error.   

The tests with individual rows must definitely be repeated.
 
In my tests the results are closer together:

Method 1: Retrieve Individual Row 00:00:00.081151 1
Method 3: using dynamic in00:00:00.060368 9995
Method 5: using in carray 00:00:00.050884 9995
Method 5: using carray join   00:00:00.043127 1
Method 6: Using temp table00:00:00.060808 9995

(for oarameter = 1)

I tuned the Python script, using fetchone() in the individual row test. 
And I added a temp table test. In Python this just uses executemany() to insert 
the rowset.

Thanks, E. Pasma
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Jose Isaias Cabrera

Keith Medcalf, on Monday, September 16, 2019 01:33 PM, wrote...
>
> It will, but that depends how many rows there are.
>
> That is, the statement:  SELECT * FROM t1 WHERE id IN (1,2,3,4,5,6)
>
> Is equivalent to
>
> CREATE TEMPORARY TABLE keyset (key PRIMARY KEY);
> INSERT OR IGNORE INTO keyset VALUES (1), (2), (3), (4), (5), (6);
> SELECT * FROM t1 WHERE id IN keyset;
> DROP TABLE keyset;
>
> without the overhead of parsing the extra create/insert/drop statements.
>  And of course the part:
>
> SELECT * FROM t1 WHERE id IN keyset;
>
> is really
>
> SELECT * FROM t1 JOIN keyset ON t1.id == keyset.key;
>
> which is just sugar for:
>
> SELECT * FROM t1, keyset where t1.id == keyset.key;
>
> That means that the query planner may place the keyset in the outer loop,
> or it may place t1 in the outer loop, depending on which is "better" and
> what else the query is doing.  In other words the list (...) just becomes
> an index containing the values that is subsequently treated as a table (and
> for that purpose the list (1,3,5,7,9) is the same as list
> (9,1,7,3,5,1,9,3,7) -- since it is sorted and unique).  Note that the list
> may contain NULLs but they are ignored.

Wow!  I learned a tone in this beauty!  Saving it! :-)

josé
Thanks
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Keith Medcalf

It will, but that depends how many rows there are.  

That is, the statement:  SELECT * FROM t1 WHERE id IN (1,2,3,4,5,6)

Is equivalent to

CREATE TEMPORARY TABLE keyset (key PRIMARY KEY);
INSERT OR IGNORE INTO keyset VALUES (1), (2), (3), (4), (5), (6);
SELECT * FROM t1 WHERE id IN keyset;
DROP TABLE keyset;

without the overhead of parsing the extra create/insert/drop statements.  And 
of course the part:

SELECT * FROM t1 WHERE id IN keyset;

is really

SELECT * FROM t1 JOIN keyset ON t1.id == keyset.key;

which is just sugar for:

SELECT * FROM t1, keyset where t1.id == keyset.key;

That means that the query planner may place the keyset in the outer loop, or it 
may place t1 in the outer loop, depending on which is "better" and what else 
the query is doing.  In other words the list (...) just becomes an index 
containing the values that is subsequently treated as a table (and for that 
purpose the list (1,3,5,7,9) is the same as list (9,1,7,3,5,1,9,3,7) -- since 
it is sorted and unique).  Note that the list may contain NULLs but they are 
ignored.
 
-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Jens Alfke
>Sent: Monday, 16 September, 2019 10:58
>To: SQLite mailing list 
>Subject: Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
>
>
>
>> On Sep 13, 2019, at 10:57 AM, Hick Gunter  wrote:
>>
>> This is faster if the number of keys in the list is small relative to
>the number of records in the table.
>> If the number of keys is similar to the number of records in the table,
>then a simple full table scan may be faster.
>
>Experimentally, the optimizer seems to choose an index search even with
>the simpler query. I ran this on a test database with about 30k rows.
>
>> explain query plan select * from kv_default where key in ('a','b','c')
>
>3|0|0|SEARCH TABLE kv_default USING INDEX sqlite_autoindex_kv_default_1
>(key=?)
>
>—Jens
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Simon Slavin
On 16 Sep 2019, at 5:58pm, Jens Alfke  wrote:

> Experimentally, the optimizer seems to choose an index search even with the 
> simpler query. I ran this on a test database with about 30k rows.

In case you forgot I'm just reminding you to run ANALYZE after putting your 
data and indexes in.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Jens Alfke


> On Sep 13, 2019, at 10:57 AM, Hick Gunter  wrote:
> 
> This is faster if the number of keys in the list is small relative to the 
> number of records in the table.
> If the number of keys is similar to the number of records in the table, then 
> a simple full table scan may be faster.

Experimentally, the optimizer seems to choose an index search even with the 
simpler query. I ran this on a test database with about 30k rows.

> explain query plan select * from kv_default where key in ('a','b','c')

3|0|0|SEARCH TABLE kv_default USING INDEX sqlite_autoindex_kv_default_1 (key=?)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Hick Gunter
WITH list (key) AS (VALUES (mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jens Alfke
Gesendet: Freitag, 13. September 2019 18:39
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Fastest way to SELECT on a set of keys?

If I have a set of primary keys (let's say a few hundred) and need to fetch 
data from the table rows with those keys, what's the fastest way to do so? The 
options seem to be:

(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
key strings.

If I do (a), I can pre-prepare the statement and save the overhead of 
compilation. But SQLite has to go through the rest of its work (starting the 
virtual machine, b-tree lookup, etc.) once for each key.

If I do (b), SQLite has less setup work to do, and it could potentially 
optimize the b-tree lookup. On the downside, I have to prepare a statement 
every time since the RHS of an "IN" isn't substitutable.

Does anyone have intuition or actual knowledge about which approach is better? 
Or know of a 3rd better approach?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users