That depends greatly on the overhead you have for executing each select statement. So I wrote a little test that uses my customized apsw library from Python 3. It also works using the as-distributed sqlite3 wrapper (except for the carray interface, which requires my customized apsw to be able to build and pass the object). The overheads associated with each method are included in the elapsed time. The only thing that is clear is that where the overhead of executing each select is significant it is clearly better to execute fewer of them.
>st 1 Method 1: Retrieve Individual Row 00:00:00.103779 Method 2: Individual Row (Sorted) 00:00:00.109945 Method 3: using dynamic in 00:00:00.137431 Method 4: using sorted dynamic in 00:00:00.110824 Method 5: using in carray 00:00:00.171037 Method 5: using in carray sorted 00:00:00.165992 >st 10 Method 1: Retrieve Individual Row 00:00:01.023160 Method 2: Individual Row (Sorted) 00:00:01.187180 Method 3: using dynamic in 00:00:00.159182 Method 4: using sorted dynamic in 00:00:00.175053 Method 5: using in carray 00:00:00.192246 Method 5: using in carray sorted 00:00:00.154138 >st 100 Method 1: Retrieve Individual Row 00:00:10.543783 Method 2: Individual Row (Sorted) 00:00:10.305251 Method 3: using dynamic in 00:00:00.196502 Method 4: using sorted dynamic in 00:00:00.176414 Method 5: using in carray 00:00:00.203340 Method 5: using in carray sorted 00:00:00.191570 >st 1000 Method 1: Retrieve Individual Row 00:01:40.558009 Method 2: Individual Row (Sorted) 00:01:42.051622 Method 3: using dynamic in 00:00:00.246542 Method 4: using sorted dynamic in 00:00:00.238268 Method 5: using in carray 00:00:00.249394 Method 5: using in carray sorted 00:00:00.243244 >st 10000 Method 3: using dynamic in 00:00:00.277059 Method 4: using sorted dynamic in 00:00:00.296931 Method 5: using in carray 00:00:00.297005 Method 5: using in carray sorted 00:00:00.322317 >st 100000 Method 3: using dynamic in 00:00:00.761905 Method 4: using sorted dynamic in 00:00:00.765864 Method 5: using in carray 00:00:00.757057 Method 5: using in carray sorted 00:00:00.691111 >st 1000000 Method 3: using dynamic in 00:00:04.129529 Method 4: using sorted dynamic in 00:00:04.301129 Method 5: using in carray 00:00:04.114985 Method 5: using in carray sorted 00:00:04.417498 And the code: #! python3 import apsw import datetime import random import sqlite3 import sys import time datasize = 1000000 rows = int(sys.argv[1]) elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time() db = apsw.Connection('') #db = sqlite3.connect('', isolation_level=None) db.executescript(''' create table x ( id integer primay key, data blob ); insert into x select value, randomblob(500) from generate_series where start=1 and stop=%d; ''' % (datasize,)) rowset = [random.randint(1, datasize) for i in range(rows)] if rows <= 1000: print('Method 1: Retrieve Individual Row', end=' ', flush=True) st = time.time() db.executescript('BEGIN') for key in rowset: for row in db.execute('select * from x where id=?', (key,)): pass db.commit() print(elapsed(st, time.time())) print('Method 2: Individual Row (Sorted)', end=' ', flush=True) st = time.time() db.executescript('BEGIN') for key in sorted(rowset): for row in db.execute('select * from x where id=?', (key,)): pass db.commit() print(elapsed(st, time.time())) print('Method 3: using dynamic in ', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in (' + ','.join(map(str, rowset)) + ')'): pass print(elapsed(st, time.time())) print('Method 4: using sorted dynamic in', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in (' + ','.join(map(str, sorted(rowset))) + ')'): pass print(elapsed(st, time.time())) print('Method 5: using in carray ', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in carray(:l_address, :l_length, :l_type)', apsw.carray('l', rowset)): pass print(elapsed(st, time.time())) print('Method 5: using in carray sorted ', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in carray(:l_address, :l_length, :l_type)', apsw.carray('l', sorted(rowset))): pass print(elapsed(st, time.time())) -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On >Behalf Of Jens Alfke >Sent: Friday, 13 September, 2019 10:39 >To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> >Subject: [sqlite] Fastest way to SELECT on a set of keys? > >If I have a set of primary keys (let's say a few hundred) and need to >fetch data from the table rows with those keys, what's the fastest way to >do so? The options seem to be: > >(a) Execute "SELECT … FROM table WHERE key=?", once for each key. >(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of >the key strings. > >If I do (a), I can pre-prepare the statement and save the overhead of >compilation. But SQLite has to go through the rest of its work (starting >the virtual machine, b-tree lookup, etc.) once for each key. > >If I do (b), SQLite has less setup work to do, and it could potentially >optimize the b-tree lookup. On the downside, I have to prepare a >statement every time since the RHS of an "IN" isn't substitutable. > >Does anyone have intuition or actual knowledge about which approach is >better? Or know of a 3rd better approach? > >—Jens >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users