Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
Keith, The final script produces corresponding results here, only a constant factor slower (minimal CPU). The rows per second is useful to summarize the tests for various keyset sizes. Below is the average per method with input parameter 5. meth|rps|note 1|149431|Individual Row 2|195447|Individual Row (Sorted) 3|167740|Rows ExecuteMany 3|167740|Rows ExecuteMany Sorted 4|146503|Using IN temp 5|149261|Using IN temp (sorted) 6|137831|Using IN keyset 7|136984|Using IN keyset sorted 8|170922|Using IN (dynamic) 9|188759|Using IN (sorted) A|242761|Using IN CArray B|274883|Using IN CArray sorted C|308547|Using Array JOIN sorted Hope this is useful to the original poster. To me SQLite-Python is almost addicting. I learned to use carray now. It appears to interface brillantly with Python's array module. Only I have a custom carray instead of a custom execute method as you APSW (replaced sqlite3_bind_pointer by sqlite3_value_int64, for home use only). Method C is a JOIN to carray, where the keys are sorted. Order by is not needed then. select x.* from carray(?,?,'int64') cross join x on id=value Thanks, E. Pasma ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
On Monday, 16 September, 2019 14:22, E.Pasma wrote: >Stop stop stop >> create table x >> ( >>id integer primay key, >>datablob >> ); >I did not see this until searching for the word PRIMARY and not finding >it. Thus id is not a primary key at all. Probably it is a good habit to >always add WITHOUT ROWID when there is an explicit primary key. The SQL >parser would then have reported an error. >The tests with individual rows must definitely be repeated. >In my tests the results are closer together: >Method 1: Retrieve Individual Row 00:00:00.081151 1 >Method 3: using dynamic in00:00:00.060368 9995 >Method 5: using in carray 00:00:00.050884 9995 >Method 5: using carray join 00:00:00.043127 1 >Method 6: Using temp table00:00:00.060808 9995 >(for oarameter = 1) >I tuned the Python script, using fetchone() in the individual row test. >And I added a temp table test. In Python this just uses executemany() to >insert the rowset. You are right. What a difference a spelling error makes ... No wonder it took so long as it was doing table scans -- and the optimizer was doing a jolly job in the other cases in dealing with it. Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements ... but it will do them with INSERT statements. Nevertheless, the results are reasonably similar to these obtained with APSW ... >st 1 Creating db and sample keys: 100 rows; 1 keys Method 1: Individual Row 00:00:00.000150 Method 2: Individual Row (Sorted) 00:00:00.79 Method 3: Rows with ExecMany 00:00:00.63 Method 3: Rows with ExecMany Sort 00:00:00.61 Method 4: Using IN temp 00:00:00.000323 Method 5: Using IN temp (sorted) 00:00:00.000266 Method 6: Using IN temp no rowid 00:00:00.000284 Method 7: Using IN (dynamic) 00:00:00.78 Method 8: Using IN (sorted) 00:00:00.61 Method 9: Using IN CArray 00:00:00.000195 Method A: Using IN CArray sorted 00:00:00.75 >st 10 Creating db and sample keys: 100 rows; 10 keys Method 1: Individual Row 00:00:00.000366 Method 2: Individual Row (Sorted) 00:00:00.000309 Method 3: Rows with ExecMany 00:00:00.000293 Method 3: Rows with ExecMany Sort 00:00:00.000305 Method 4: Using IN temp 00:00:00.000418 Method 5: Using IN temp (sorted) 00:00:00.000359 Method 6: Using IN temp no rowid 00:00:00.000430 Method 7: Using IN (dynamic) 00:00:00.000147 Method 8: Using IN (sorted) 00:00:00.000141 Method 9: Using IN CArray 00:00:00.000372 Method A: Using IN CArray sorted 00:00:00.000132 >st 100 Creating db and sample keys: 100 rows; 100 keys Method 1: Individual Row 00:00:00.002314 Method 2: Individual Row (Sorted) 00:00:00.002144 Method 3: Rows with ExecMany 00:00:00.002010 Method 3: Rows with ExecMany Sort 00:00:00.001807 Method 4: Using IN temp 00:00:00.001042 Method 5: Using IN temp (sorted) 00:00:00.000963 Method 6: Using IN temp no rowid 00:00:00.001004 Method 7: Using IN (dynamic) 00:00:00.000573 Method 8: Using IN (sorted) 00:00:00.000548 Method 9: Using IN CArray 00:00:00.000671 Method A: Using IN CArray sorted 00:00:00.000588 >st 1000 Creating db and sample keys: 100 rows; 1000 keys Method 1: Individual Row 00:00:00.019247 Method 2: Individual Row (Sorted) 00:00:00.017748 Method 3: Rows with ExecMany 00:00:00.016084 Method 3: Rows with ExecMany Sort 00:00:00.015766 Method 4: Using IN temp 00:00:00.007528 Method 5: Using IN temp (sorted) 00:00:00.007821 Method 6: Using IN temp no rowid 00:00:00.007600 Method 7: Using IN (dynamic) 00:00:00.005317 Method 8: Using IN (sorted) 00:00:00.004884 Method 9: Using IN CArray 00:00:00.005081 Method A: Using IN CArray sorted 00:00:00.005190 >st 1 Creating db and sample keys: 100 rows; 1 keys Method 1: Individual Row 00:00:00.178937 Method 2: Individual Row (Sorted) 00:00:00.180979 Method 3: Rows with ExecMany 00:00:00.165302 Method 3: Rows with ExecMany Sort 00:00:00.163846 Method 4: Using IN temp 00:00:00.076111 Method 5: Using IN temp (sorted) 00:00:00.076974 Method 6: Using IN temp no rowid 00:00:00.077122 Method 7: Using IN (dynamic) 00:00:00.049132 Method 8: Using IN (sorted) 00:00:00.050656 Method 9: Using IN CArray 00:00:00.052837 Method A: Using IN CArray sorted 00:00:00.050192 >st 10 Creating db and sample keys: 100 rows; 10 keys Method 1: Individual Row 00:00:01.777458 Method 2: Individual Row (Sorted) 00:00:01.708890 Method 3: Rows with ExecMany 00:00:01.676193 Method 3: Rows with ExecMany Sort 00:00:01.639589 Method 4: Using IN temp 00:00:00.756932 Method 5: Using IN temp (sorted) 00:00:00.742670 Method 6: Using IN temp no rowid 00:00:00.786706 Method 7: Using IN (dynamic) 00:00:00.504242 Method 8: Using IN (sorted) 00:00:00.503634 Method 9:
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
Stop stop stop > create table x > ( >id integer primay key, >datablob > ); I did not see this until searching for the word PRIMARY and not finding it. Thus id is not a primary key at all. Probably it is a good habit to always add WITHOUT ROWID when there is an explicit primary key. The SQL parser would then have reported an error. The tests with individual rows must definitely be repeated. In my tests the results are closer together: Method 1: Retrieve Individual Row 00:00:00.081151 1 Method 3: using dynamic in00:00:00.060368 9995 Method 5: using in carray 00:00:00.050884 9995 Method 5: using carray join 00:00:00.043127 1 Method 6: Using temp table00:00:00.060808 9995 (for oarameter = 1) I tuned the Python script, using fetchone() in the individual row test. And I added a temp table test. In Python this just uses executemany() to insert the rowset. Thanks, E. Pasma ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
Keith Medcalf, on Monday, September 16, 2019 01:33 PM, wrote... > > It will, but that depends how many rows there are. > > That is, the statement: SELECT * FROM t1 WHERE id IN (1,2,3,4,5,6) > > Is equivalent to > > CREATE TEMPORARY TABLE keyset (key PRIMARY KEY); > INSERT OR IGNORE INTO keyset VALUES (1), (2), (3), (4), (5), (6); > SELECT * FROM t1 WHERE id IN keyset; > DROP TABLE keyset; > > without the overhead of parsing the extra create/insert/drop statements. > And of course the part: > > SELECT * FROM t1 WHERE id IN keyset; > > is really > > SELECT * FROM t1 JOIN keyset ON t1.id == keyset.key; > > which is just sugar for: > > SELECT * FROM t1, keyset where t1.id == keyset.key; > > That means that the query planner may place the keyset in the outer loop, > or it may place t1 in the outer loop, depending on which is "better" and > what else the query is doing. In other words the list (...) just becomes > an index containing the values that is subsequently treated as a table (and > for that purpose the list (1,3,5,7,9) is the same as list > (9,1,7,3,5,1,9,3,7) -- since it is sorted and unique). Note that the list > may contain NULLs but they are ignored. Wow! I learned a tone in this beauty! Saving it! :-) josé Thanks ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
It will, but that depends how many rows there are. That is, the statement: SELECT * FROM t1 WHERE id IN (1,2,3,4,5,6) Is equivalent to CREATE TEMPORARY TABLE keyset (key PRIMARY KEY); INSERT OR IGNORE INTO keyset VALUES (1), (2), (3), (4), (5), (6); SELECT * FROM t1 WHERE id IN keyset; DROP TABLE keyset; without the overhead of parsing the extra create/insert/drop statements. And of course the part: SELECT * FROM t1 WHERE id IN keyset; is really SELECT * FROM t1 JOIN keyset ON t1.id == keyset.key; which is just sugar for: SELECT * FROM t1, keyset where t1.id == keyset.key; That means that the query planner may place the keyset in the outer loop, or it may place t1 in the outer loop, depending on which is "better" and what else the query is doing. In other words the list (...) just becomes an index containing the values that is subsequently treated as a table (and for that purpose the list (1,3,5,7,9) is the same as list (9,1,7,3,5,1,9,3,7) -- since it is sorted and unique). Note that the list may contain NULLs but they are ignored. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Jens Alfke >Sent: Monday, 16 September, 2019 10:58 >To: SQLite mailing list >Subject: Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys? > > > >> On Sep 13, 2019, at 10:57 AM, Hick Gunter wrote: >> >> This is faster if the number of keys in the list is small relative to >the number of records in the table. >> If the number of keys is similar to the number of records in the table, >then a simple full table scan may be faster. > >Experimentally, the optimizer seems to choose an index search even with >the simpler query. I ran this on a test database with about 30k rows. > >> explain query plan select * from kv_default where key in ('a','b','c') > >3|0|0|SEARCH TABLE kv_default USING INDEX sqlite_autoindex_kv_default_1 >(key=?) > >—Jens >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
On 16 Sep 2019, at 5:58pm, Jens Alfke wrote: > Experimentally, the optimizer seems to choose an index search even with the > simpler query. I ran this on a test database with about 30k rows. In case you forgot I'm just reminding you to run ANALYZE after putting your data and indexes in. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
> On Sep 13, 2019, at 10:57 AM, Hick Gunter wrote: > > This is faster if the number of keys in the list is small relative to the > number of records in the table. > If the number of keys is similar to the number of records in the table, then > a simple full table scan may be faster. Experimentally, the optimizer seems to choose an index search even with the simpler query. I ran this on a test database with about 30k rows. > explain query plan select * from kv_default where key in ('a','b','c') 3|0|0|SEARCH TABLE kv_default USING INDEX sqlite_autoindex_kv_default_1 (key=?) —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
WITH list (key) AS (VALUES (mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Jens Alfke Gesendet: Freitag, 13. September 2019 18:39 An: SQLite mailing list Betreff: [EXTERNAL] [sqlite] Fastest way to SELECT on a set of keys? If I have a set of primary keys (let's say a few hundred) and need to fetch data from the table rows with those keys, what's the fastest way to do so? The options seem to be: (a) Execute "SELECT … FROM table WHERE key=?", once for each key. (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the key strings. If I do (a), I can pre-prepare the statement and save the overhead of compilation. But SQLite has to go through the rest of its work (starting the virtual machine, b-tree lookup, etc.) once for each key. If I do (b), SQLite has less setup work to do, and it could potentially optimize the b-tree lookup. On the downside, I have to prepare a statement every time since the RHS of an "IN" isn't substitutable. Does anyone have intuition or actual knowledge about which approach is better? Or know of a 3rd better approach? —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users