[sqlite] SQLITE_MAX_MMAP_SIZE 2GB default
Hi there, My apologies if this has been answered in the past, but is there a reason for the ~2GB default for SQLITE_MAX_MMAP_SIZE ? #ifndef SQLITE_MAX_MMAP_SIZE ... # define SQLITE_MAX_MMAP_SIZE 0x7fff /* 2147418112 */ It makes sense for 32bit platforms, but for 64bit with larger databases (where 2GB ends up being a small fraction of the db file), the default max effectively takes away the benefit of the feature. I can compile my own binary of course, but it would be convenient to have it supported out of the box. Just wondering if there is a reason not to increase the max for 64bit builds (to some large value like 1TB) ? Especially given that that the default mmap size is much lower in any case until it is increased (with PRAGMA mmap_size). Thanks..! Carl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reporting two more interesting queries with performance regression
On 4/6/19, Jinho Jung wrote: > > +--++ > | Query| Time | > +--++ > | 10002.sql (v3.23)|789 | > | 10002.sql (v3.27.2) | 1270 | > +--++ > | 10052.sql (v3.23)| 3094 | > | 10052.sql (v3.27.2) | 4478 | > +--++ > > 1) 10002.sql shows 60% performance regression > - bisect fossil commit: > === 2018-12-31 === > [9fb646f29c] *MERGE* Merge enhancements and bug fixes from trunk. (user: > drh tags: reuse- >subqueries) I confirm that your test case is slightly slower on v3.28.0 compared to v3.23.0. But when I do the bisect, I land on a very different (and more plausible) check-in: https://sqlite.org/src/info/e130319317e76119 Here is the result of my bisect: https://sqlite.org/src/timeline?bid=y736b53f57fn03f2e78899y8eb62fd5fan9cf8ebd141n0888fc2e88y4cdcda408ay6821c61f1dy4678cb1044n0465d2fc0dn5c188361a9nb57c545a38ne130319317yf856676c84 > > 2) 10052.sql shows 40% performance regression > - bisect fossil commit: > === 2018-12-24 === > [7153552bac] Improvements to EXPLAIN QUERY PLAN formatting. The > MULTI-INDEX OR now >shows a separate "INDEX" subtree for each index. SCALAR SUBQUERY entries > provide a >subquery number that is related back to the .selecttrace output. (user: > drh tags: reuse- >subqueries) I confirm that there is a slight slowdown here. But for me, this bisect lands on the same check-in: https://sqlite.org/src/info/e130319317e76119 This is the bisect: https://sqlite.org/src/timeline?bid=y736b53f57fn03f2e78899y8eb62fd5fan9cf8ebd141n0888fc2e88y4cdcda408ay6821c61f1dy4678cb1044n0465d2fc0dn5c188361a9nb57c545a38ne130319317yf856676c84 -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Page Size question
On 4/18/19, Stephen Chrzanowski wrote: > I'm wondering > if there is going to be any optimization internally from SQLite by making > the page size, say, 180k? ( SQLite handles storing 200KB blobs just fine. It has been doing so routinely for time out of mind. Do not stress over the details of how it does so - that is a premature optimization. Just be assured that it works. Once you get your application up and working, if you want to optimize at that point, simply construct some test databases and manually change the page size to each of the eight different allowed SQLite page sizes (512, 1024, 2048, 4096, 8192, 16384, 32768, and 65536) and check to see the storage efficiency and performance at each page size. Then use whichever one works best. They should all work correctly - it is only a question of performance and size. And even then, the difference will be marginal. If you are concerned about unused space in the file, run the "sqlite3_analyzer.exe" utility on a database to get a detailed analysis of the unused space in the database file. Typically, storing blobs in an SQLite database will use *less* space than storing those same blobs as separate files in the filesystem. See https://www.sqlite.org/fasterthanfs.html for further analysis. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Page Size question
When I get my act together and get un-lazy about it, I want to take a large set of "small" files and put them into a SQLite database blobs in a table, then have my app run some kind of CRC check (CRC32/MD5SUM/SHA1/custom/whatever - Haven't decided on a CRC or just byte-for byte comparisons) to identify byte exact duplicate files. I'm not so much concerned about the physical file size of the database file and how the page size can cross 4k sized cluster boundaries on a disk drive, but a possible optimization on the page size within the database for reading and writing the data from and to the database file. ALL of the files are going to be a multiple of 174,848 bytes. So that, plus that CRC check, plus an integer based auto-inc'd field, I'm wondering if there is going to be any optimization internally from SQLite by making the page size, say, 180k? (I know, there'll be unused bytes in the page, but, that's a low priority thing -- I'll tighten up that page size and re-import data when I get more to a "production" state) My app is going to be on Windows, and I'm not concerned about constraints of memory or CPUs or anything of the sort. I'm just thinking about reading in and out of the database as a POSSIBLE early (Not exactly premature I believe) optimization. I figure that if SQLite is going to read in the 174,848 byte blob, it'd be quicker to read in that one page file into memory rather than take multiple stabs across a fragmented file, of course, having the full understanding that fragmentation will happen at the file system level. These raw blobs are going to be sitting in a table by themselves, with a separate many-to-one table sitting elsewhere for other nefarious reasons. I know that this table, plus any others, are going to end up sitting in the 180k sized pages, but beyond that, is there any practical gain going to 180k instead of the default 4k? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Thu, 18 Apr 2019 at 09:48, Hick Gunter wrote: > So the QP is attemopting to determine which cost product is less: > > My guess is it will probably choose the undesired, invalid plan. A linear > cost penalty for desired but optional inputs is probably not going to cut > it. In this case, the returned cost needs to be greater than 4 (more if > there are more records in the IN set). I feel there needs to be a > multiplicative penalty of at least 10. > > > As I mentioned in other post, the direct test with my current implementation and IN operator gave a good result, but I understand this can't be a rule. I see your points about cost adjustments so will give them a try. I also decided to look at the sources, estimatedCost as usable input only mentioned once, it goes to rRun field being translated with sqlite3LogEstFromDouble call. The LogEst value is nicely explained in the sources (below is the compacted version of the comment in the sources) Estimated quantities used for query planning are stored as 16-bit > logarithms. For quantity X, the value stored is 10*log2(X). This gives a > possible range of values of approximately 1.0e986 to 1e-986. But the > allowed values are "grainy". Not every value is representable. For > example, quantities 16 and 17 are both represented by a LogEst of 40. > However, since LogEst quantities are suppose to be estimates, not exact > values, this imprecision is not a problem. > "LogEst" is short for "Logarithmic Estimate". > Examples: > 1 -> 0 20 -> 43 1 -> 132 > 2 -> 10 25 -> 46 25000 -> 146 > 3 -> 16100 -> 66100 -> 199 > 4 -> 20 1000 -> 991048576 -> 200 > 10 -> 33 1024 -> 1004294967296 -> 320 > The LogEst can be negative to indicate fractional values. > Examples: > 0.5 -> -10 0.1 -> -330.0625 -> -40 > It seems that as a general rule it's not good to have close estimatedCost values since they will sometimes yield the same LogEst values. rRun evaluations in the sources are too complex to analyze Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
So the QP is attemopting to determine which cost product is less: Outer loop IN table (2 records, assuming cost 2) * Inner loop VT (2) = 4 Outer loop VT (3) * inner loop IN lookup (1 record, assuming cost 1) = 3 My guess is it will probably choose the undesired, invalid plan. A linear cost penalty for desired but optional inputs is probably not going to cut it. In this case, the returned cost needs to be greater than 4 (more if there are more records in the IN set). I feel there needs to be a multiplicative penalty of at least 10. Also, you could check the colUsed mask to check if the name_pattern field is referenced and react accordingly. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Mittwoch, 17. April 2019 14:56 An: SQLite mailing list Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables On Wed, 17 Apr 2019 at 15:08, Hick Gunter wrote: > ... > > SELECT contents from textfiles( 'mypath', NULL, 0); SELECT contents > from textfiles where search_path = 'mypath' and is_recursive = 1; > SLEECT contents from textfiles where search_path = 'mypath' and > name_pattern IN ('*.txt','*.csv'); > > With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) > respectively. > > And you want to know how to make SQLite always call xFilter with > ('mypath','*.txt') and ('mypath','*.csv') instead of just once with > only > ('mypath') and attempting to retrieve the undefined/empty > name_pattern. Is this correct? > > > Yes, you're correct. The name_pattern is also a good illustrative extension to the case. And indeed, IN operator is really a tricky case ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] slow join, fast subselect
On Wed, Apr 17, 2019 at 11:43:13AM -0600, Keith Medcalf wrote: > > Your made up plans are intriguing. The plan you show for the latter query > omit to join a and b. Are you just making things up? The query plans were cut and pasted from the terminal. It's easy enough to deduce where these plans came from: As someone else pointed out, the ",b" in the second query shouldn't be there, so I removed it before generating the query plans. That step of the query plan is irrelevant anyway. The point is that in the subselect variant the query the planner chooses this 7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR} which, given the conditions, is a far better choise than what the planner chooses in the "join" variant: 8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)} It would be easy enough again for the planner to deduce this, but as Ryan Smith described, may not be worth doing in the general case. I don't know. I'm just reporting in from the field. -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users