That depends greatly on the overhead you have for executing each select 
statement.  So I wrote a little test that uses my customized apsw library from 
Python 3.  It also works using the as-distributed sqlite3 wrapper (except for 
the carray interface, which requires my customized apsw to be able to build and 
pass the object).  The overheads associated with each method are included in 
the elapsed time.  The only thing that is clear is that where the overhead of 
executing each select is significant it is clearly better to execute fewer of 
them.

>st 1
Method 1: Retrieve Individual Row 00:00:00.103779
Method 2: Individual Row (Sorted) 00:00:00.109945
Method 3: using dynamic in        00:00:00.137431
Method 4: using sorted dynamic in 00:00:00.110824
Method 5: using in carray         00:00:00.171037
Method 5: using in carray sorted  00:00:00.165992

>st 10
Method 1: Retrieve Individual Row 00:00:01.023160
Method 2: Individual Row (Sorted) 00:00:01.187180
Method 3: using dynamic in        00:00:00.159182
Method 4: using sorted dynamic in 00:00:00.175053
Method 5: using in carray         00:00:00.192246
Method 5: using in carray sorted  00:00:00.154138

>st 100
Method 1: Retrieve Individual Row 00:00:10.543783
Method 2: Individual Row (Sorted) 00:00:10.305251
Method 3: using dynamic in        00:00:00.196502
Method 4: using sorted dynamic in 00:00:00.176414
Method 5: using in carray         00:00:00.203340
Method 5: using in carray sorted  00:00:00.191570

>st 1000
Method 1: Retrieve Individual Row 00:01:40.558009
Method 2: Individual Row (Sorted) 00:01:42.051622
Method 3: using dynamic in        00:00:00.246542
Method 4: using sorted dynamic in 00:00:00.238268
Method 5: using in carray         00:00:00.249394
Method 5: using in carray sorted  00:00:00.243244

>st 10000
Method 3: using dynamic in        00:00:00.277059
Method 4: using sorted dynamic in 00:00:00.296931
Method 5: using in carray         00:00:00.297005
Method 5: using in carray sorted  00:00:00.322317

>st 100000
Method 3: using dynamic in        00:00:00.761905
Method 4: using sorted dynamic in 00:00:00.765864
Method 5: using in carray         00:00:00.757057
Method 5: using in carray sorted  00:00:00.691111

>st 1000000
Method 3: using dynamic in        00:00:04.129529
Method 4: using sorted dynamic in 00:00:04.301129
Method 5: using in carray         00:00:04.114985
Method 5: using in carray sorted  00:00:04.417498


And the code:

#! python3

import apsw
import datetime
import random
import sqlite3
import sys
import time

datasize = 1000000
rows = int(sys.argv[1])

elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()

db = apsw.Connection('')
#db = sqlite3.connect('', isolation_level=None)

db.executescript('''
create table x
(
    id      integer primay key,
    data    blob
);
insert into x select value, randomblob(500) from generate_series where start=1 
and stop=%d;
''' % (datasize,))

rowset = [random.randint(1, datasize) for i in range(rows)]

if rows <= 1000:
    print('Method 1: Retrieve Individual Row', end=' ', flush=True)
    st = time.time()
    db.executescript('BEGIN')
    for key in rowset:
        for row in db.execute('select * from x where id=?', (key,)):
            pass
    db.commit()
    print(elapsed(st, time.time()))

    print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
    st = time.time()
    db.executescript('BEGIN')
    for key in sorted(rowset):
        for row in db.execute('select * from x where id=?', (key,)):
            pass
    db.commit()
    print(elapsed(st, time.time()))

print('Method 3: using dynamic in       ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str, 
rowset)) + ')'):
    pass
print(elapsed(st, time.time()))

print('Method 4: using sorted dynamic in', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str, 
sorted(rowset))) + ')'):
    pass
print(elapsed(st, time.time()))

print('Method 5: using in carray        ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address, 
:l_length, :l_type)', apsw.carray('l', rowset)):
    pass
print(elapsed(st, time.time()))

print('Method 5: using in carray sorted ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address, 
:l_length, :l_type)', apsw.carray('l', sorted(rowset))):
    pass
print(elapsed(st, time.time()))

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On
>Behalf Of Jens Alfke
>Sent: Friday, 13 September, 2019 10:39
>To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
>Subject: [sqlite] Fastest way to SELECT on a set of keys?
>
>If I have a set of primary keys (let's say a few hundred) and need to
>fetch data from the table rows with those keys, what's the fastest way to
>do so? The options seem to be:
>
>(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
>(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of
>the key strings.
>
>If I do (a), I can pre-prepare the statement and save the overhead of
>compilation. But SQLite has to go through the rest of its work (starting
>the virtual machine, b-tree lookup, etc.) once for each key.
>
>If I do (b), SQLite has less setup work to do, and it could potentially
>optimize the b-tree lookup. On the downside, I have to prepare a
>statement every time since the RHS of an "IN" isn't substitutable.
>
>Does anyone have intuition or actual knowledge about which approach is
>better? Or know of a 3rd better approach?
>
>—Jens
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to