[sqlite] SQLITE_MAX_MMAP_SIZE 2GB default

2019-04-18 Thread Carl Edquist

Hi there,

My apologies if this has been answered in the past, but is there a reason 
for the ~2GB default for SQLITE_MAX_MMAP_SIZE ?


#ifndef SQLITE_MAX_MMAP_SIZE
...
#   define SQLITE_MAX_MMAP_SIZE 0x7fff  /* 2147418112 */

It makes sense for 32bit platforms, but for 64bit with larger databases 
(where 2GB ends up being a small fraction of the db file), the default max 
effectively takes away the benefit of the feature.


I can compile my own binary of course, but it would be convenient to have 
it supported out of the box.


Just wondering if there is a reason not to increase the max for 64bit 
builds (to some large value like 1TB) ?  Especially given that that the 
default mmap size is much lower in any case until it is increased (with 
PRAGMA mmap_size).



Thanks..!

Carl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reporting two more interesting queries with performance regression

2019-04-18 Thread Richard Hipp
On 4/6/19, Jinho Jung  wrote:
>
> +--++
> | Query|   Time |
> +--++
> | 10002.sql (v3.23)|789 |
> | 10002.sql (v3.27.2)  |   1270 |
> +--++
> | 10052.sql (v3.23)|   3094 |
> | 10052.sql (v3.27.2)  |   4478 |
> +--++
>
> 1) 10002.sql shows 60% performance regression
>  - bisect fossil commit:
>   === 2018-12-31 ===
>   [9fb646f29c] *MERGE* Merge enhancements and bug fixes from trunk. (user:
> drh tags: reuse-
>subqueries)

I confirm that your test case is slightly slower on v3.28.0 compared
to v3.23.0.  But when I do the bisect, I land on a very different (and
more plausible) check-in: https://sqlite.org/src/info/e130319317e76119

Here is the result of my bisect:
https://sqlite.org/src/timeline?bid=y736b53f57fn03f2e78899y8eb62fd5fan9cf8ebd141n0888fc2e88y4cdcda408ay6821c61f1dy4678cb1044n0465d2fc0dn5c188361a9nb57c545a38ne130319317yf856676c84

>
> 2) 10052.sql shows 40% performance regression
>  - bisect fossil commit:
>   === 2018-12-24 ===
>   [7153552bac] Improvements to EXPLAIN QUERY PLAN formatting. The
> MULTI-INDEX OR now
>shows a separate "INDEX" subtree for each index. SCALAR SUBQUERY entries
> provide a
>subquery number that is related back to the .selecttrace output. (user:
> drh tags: reuse-
>subqueries)

I confirm that there is a slight slowdown here. But for me, this
bisect lands on the same check-in:
https://sqlite.org/src/info/e130319317e76119

This is the bisect:
https://sqlite.org/src/timeline?bid=y736b53f57fn03f2e78899y8eb62fd5fan9cf8ebd141n0888fc2e88y4cdcda408ay6821c61f1dy4678cb1044n0465d2fc0dn5c188361a9nb57c545a38ne130319317yf856676c84

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Page Size question

2019-04-18 Thread Richard Hipp
On 4/18/19, Stephen Chrzanowski  wrote:
> I'm wondering
> if there is going to be any optimization internally from SQLite by making
> the page size, say, 180k? (

SQLite handles storing 200KB blobs just fine.  It has been doing so
routinely for time out of mind.  Do not stress over the details of how
it does so - that is a premature optimization.  Just be assured that
it works.

Once you get your application up and working, if you want to optimize
at that point, simply construct some test databases and manually
change the page size to each of the eight different allowed SQLite
page sizes (512, 1024, 2048, 4096, 8192, 16384, 32768, and 65536) and
check to see the storage efficiency and performance at each page size.
Then use whichever one works best.  They should all work correctly -
it is only a question of performance and size.  And even then, the
difference will be marginal.

If you are concerned about unused space in the file, run the
"sqlite3_analyzer.exe" utility on a database to get a detailed
analysis of the unused space in the database file.

Typically, storing blobs in an SQLite database will use *less* space
than storing those same blobs as separate files in the filesystem.
See https://www.sqlite.org/fasterthanfs.html for further analysis.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Page Size question

2019-04-18 Thread Stephen Chrzanowski
When I get my act together and get un-lazy about it, I want to take a large
set of "small" files and put them into a SQLite database blobs in a table,
then have my app run some kind of CRC check
(CRC32/MD5SUM/SHA1/custom/whatever - Haven't decided on a CRC or just
byte-for byte comparisons) to identify byte exact duplicate files.  I'm not
so much concerned about the physical file size of the database file and how
the page size can cross 4k sized cluster boundaries on a disk drive, but a
possible optimization on the page size within the database for reading and
writing the data from and to the database file.

ALL of the files are going to be a multiple of 174,848 bytes.  So that,
plus that CRC check, plus an integer based auto-inc'd field, I'm wondering
if there is going to be any optimization internally from SQLite by making
the page size, say, 180k? (I know, there'll be unused bytes in the page,
but, that's a low priority thing -- I'll tighten up that page size and
re-import data when I get more to a "production" state)

My app is going to be on Windows, and I'm not concerned about constraints
of memory or CPUs or anything of the sort.  I'm just thinking about reading
in and out of the database as a POSSIBLE early (Not exactly premature I
believe) optimization.  I figure that if SQLite is going to read in the
174,848 byte blob, it'd be quicker to read in that one page file into
memory rather than take multiple stabs across a fragmented file, of course,
having the full understanding that fragmentation will happen at the file
system level.

These raw blobs are going to be sitting in a table by themselves, with a
separate many-to-one table sitting elsewhere for other nefarious reasons.
I know that this table, plus any others, are going to end up sitting in the
180k sized pages, but beyond that, is there any practical gain going to
180k instead of the default 4k?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-18 Thread Max Vlasov
On Thu, 18 Apr 2019 at 09:48, Hick Gunter  wrote:

> So the QP is attemopting to determine which cost product is less:
>
> My guess is it will probably choose the undesired, invalid plan. A linear
> cost penalty for desired but optional inputs is probably not going to cut
> it. In this case, the returned cost needs to be greater than 4 (more if
> there are more records in the IN set). I feel there needs to be a
> multiplicative penalty of at least 10.
>
>
>
As I mentioned in other post, the direct test with my current
implementation and IN operator gave a good result, but I understand this
can't be a rule. I see your points about cost adjustments so will give them
a try.

I also decided to look at the sources, estimatedCost as usable input only
mentioned once, it goes to rRun field being translated with
sqlite3LogEstFromDouble call. The LogEst value is nicely explained in the
sources (below is the compacted version of the comment in the sources)

Estimated quantities used for query planning are stored as 16-bit
> logarithms.  For quantity X, the value stored is 10*log2(X).  This  gives a
> possible range of values of approximately 1.0e986 to 1e-986. But the
> allowed values are "grainy".  Not every value is representable.  For
> example, quantities 16 and 17 are both represented by a LogEst of 40.
> However, since LogEst quantities are suppose to be estimates, not exact
> values, this imprecision is not a problem.
>  "LogEst" is short for "Logarithmic Estimate".
>  Examples:
>   1 -> 0  20 -> 43  1 -> 132
>   2 -> 10 25 -> 46  25000 -> 146
>   3 -> 16100 -> 66100 -> 199
>   4 -> 20   1000 -> 991048576 -> 200
>  10 -> 33   1024 -> 1004294967296 -> 320
>  The LogEst can be negative to indicate fractional values.
>  Examples:
> 0.5 -> -10   0.1 -> -330.0625 -> -40
>

It seems that as a general rule it's not good to have close estimatedCost
values since they will sometimes yield the same LogEst  values.
rRun evaluations in the sources are too complex to analyze

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-18 Thread Hick Gunter
So the QP is attemopting to determine which cost product is less:

Outer loop IN table (2 records, assuming cost 2) * Inner loop VT (2) = 4

Outer loop VT (3) * inner loop IN lookup (1 record, assuming cost 1) = 3

My guess is it will probably choose the undesired, invalid plan. A linear cost 
penalty for desired but optional inputs is probably not going to cut it. In 
this case, the returned cost needs to be greater than 4 (more if there are more 
records in the IN set). I feel there needs to be a multiplicative penalty of at 
least 10.

Also, you could check the colUsed mask to check if the name_pattern field is 
referenced and react accordingly.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Max Vlasov
Gesendet: Mittwoch, 17. April 2019 14:56
An: SQLite mailing list 
Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual 
tables

On Wed, 17 Apr 2019 at 15:08, Hick Gunter  wrote:

> ...
>
> SELECT contents from textfiles( 'mypath', NULL, 0); SELECT contents
> from textfiles where search_path = 'mypath' and is_recursive = 1;
> SLEECT contents from textfiles where search_path = 'mypath' and
> name_pattern IN ('*.txt','*.csv');
>
> With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN)
> respectively.
>
> And you want to know how to make SQLite always call xFilter with
> ('mypath','*.txt') and ('mypath','*.csv') instead of just once with
> only
> ('mypath') and attempting to retrieve the undefined/empty
> name_pattern. Is this correct?
>
>
>
Yes, you're correct. The name_pattern is also a good illustrative extension to 
the case. And indeed, IN operator is really a tricky case 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-18 Thread Poor Yorick
On Wed, Apr 17, 2019 at 11:43:13AM -0600, Keith Medcalf wrote:
> 
> Your made up plans are intriguing.  The plan you show for the latter query 
> omit to join a and b.  Are you just making things up?

The query plans were cut and pasted from the terminal.  It's easy enough to
deduce where these plans came from:  As someone else pointed out, the ",b" in
the second query shouldn't be there, so I removed it before generating the
query plans.  That step of the query plan is irrelevant anyway.  The point is
that in the subselect variant the query the planner chooses this 


  
  7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR} 

  

which, given the conditions, is a far better choise than what the planner
chooses in the "join" variant: 


  
  8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)}

  

It would be easy enough again for the planner to deduce this, but as Ryan
Smith described, may not be worth doing in the general case.  I don't know.
I'm just reporting in from the field.


-- 
Poor Yorick

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users