from:"Jens Alfke"

[sqlite] Troubleshooting corrupt indexes

2020-04-15 Thread Jens Alfke

We've found a few database instances with index corruption, and I'm unsure how 
to troubleshoot it.

The symptom is SELECT queries failing with SQLITE_CORRUPT. I've looked at the 
database files, and `pragma integrity_check` spits out five "row missing from 
index" errors on one index, and "wrong # of entries" on that same index and 
three others. Running `REINDEX` seems to repair everything, at least 
integrity_check reports no more problems.

The indexes aren't fancy. The one with the row-missing errors is simply
CREATE UNIQUE INDEX seq ON table (sequence)
where the `sequence` column is simply defined as `sequence INTEGER`.

The other indexes are all of the form
CREATE INDEX … ON table (customfn(body, 'key1'), customfn(body, 
'key2'), …)
where `customfn` is a custom function that extracts key-value data from the 
structured blob in `body`, somewhat like the regular `json_extract` function. 

I'm sure this custom function could conceivably cause this sort of error if it 
weren't properly deterministic, but it's been in use for a few years and pretty 
well tested, and I've never seen a bug with index corruption like this.

And I'm baffled how something as simple as the `seq` index could go awry…

—Jens [doggedly denying the existence of the new forum]

PS: We're using SQLite 3.28, statically linked into the app, on Android.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] New SQLite Forum established - this mailing list is deprecated

2020-03-15 Thread Jens Alfke

> On Mar 15, 2020, at 4:54 AM, no...@null.net wrote:
> 
> By chance or by design most emails to this list, like yours above, came
> wrapped at a length suitable for text-based readers. Yet every message
> I've received from the forum is a shocking cut-words-in-half affair in
> my terminal.

There’s a long-established email line-break-encoding convention called 
“format-flowed” you can thank for that. It allows the encoded text to be broken 
into 72-column lines while still noting where the actual line breaks are.

This way those newfangled mail clients with the astonishing ability to reflow 
text to the user’s desired width — some even use futuristic “proportional 
fonts” that were recently invented in the 1400s — can work their crazy magic, 
while the typical user reading on a VT100 or Teletype is not inconvenienced.

Seriously, RFC822 email is chock full of little affordances and edge cases like 
this. It’s not the sort of thing one blithely wades into. That’s part of the 
reason I gave for leaving forum software implementation to the experts, or at 
least to those who’ve had years to fix these kinds of mistakes.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] New SQLite Forum established - this mailing list is deprecated

2020-03-12 Thread Jens Alfke

> On Mar 12, 2020, at 1:17 PM, Richard Hipp  wrote:
> 
> I have set up an on-line forum as a replacement for this mailing list:

Oh crap.

> The Forum is powered by Fossil.  

I appreciate that you like to 'eat your own dog food'. However, I strongly 
disagree with your using a homemade forum rather than something like Discourse.

In a messaging system, the user interface is critically important. I don't 
think it matters much whether the SQLite forum can render a page in "about 
0.003s" as it says in the footer. What's important is usability — following 
discussions, finding new content, reading it, and composing messages.

There's a reason many people cling to mailing lists as their preferred 
messaging system: email clients have evolved for nearly 50 years to be good 
messaging clients. If you like mail apps there are really good ones like Apple 
Mail and Outlook, if you like using a website then Gmail etc. are pretty good, 
and if you're a CLI guy there are great terminal-based ones.

It's very easy to slap together some HTML tables and textareas and have a 
functional forum GUI. It will suck, though. The kind of things that make 
web-based forums work well are difficult to do, and in my experience there are 
few implementations that really work well — the only ones that come to mind are 
Discourse, Google Groups, and groups.io .

In a nutshell: by building a forum you're moving way outside your core 
competency. It would be wiser to outsource this to a product that's been built 
for this purpose by people who are really good at it.

Personally, I don't have SQLite questions all that often. I hang out in the 
mailing list because it's easy to follow it in my email client and it's 
convenient to post and reply. The forum, from my brief experience today, is 
really awkward. I may not be showing up there very often.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Intersecting multiple queries

2020-02-29 Thread Jens Alfke

> On Feb 28, 2020, at 11:49 PM, Hamish Allan  wrote:
> 
> Again, I may be making incorrect assumptions.

Remember the old Knuth quote about the danger of premature optimization. What’s 
the size of your data set? Have you tried making a dummy database of the same 
size and experimenting with queries?

Compiling a statement is very fast in my experience. Milliseconds. Doing it 
once in response to a user command probably won’t be noticeable.

It’s true that LIKE queries tend to be slow because they can’t, in general, be 
optimized. But there are two possibilities you haven’t mentioned:
1. If they all involve prefix matching (“x LIKE ‘prefix%’) then SQLite will use 
an index on column x to limit the search to rows where x starts with ‘prefix’.
2. You can create an index on a LIKE query with a specific pattern, I.e. 
“CREATE INDEX foo ON table (x LIKE ‘%something%’) where (x LIKE 
‘%something%’)”. I haven’t tested, but this should speed up a query using that 
specific LIKE test.

FTS does make these types of queries faster, as long as you’re looking for 
whole words or word prefixes.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] more efficient JSON encoding: idle musing

2020-02-25 Thread Jens Alfke

> On Feb 25, 2020, at 6:12 AM, J Decker  wrote:
> 
> other than that; if space is really a concern, maybe a zip layer?

In my experience, the concern is more about speed than size. Given the raw 
string/blob data from a SQLite column, and a specific property name/path, how 
fast can you find its value, convert it to a format SQLite's query engine 
understand, and return it?

The JSON1 extension's parser seems pretty darn fast, but given the nature of 
JSON, it has to do a bunch of scanning that's O(n) with the length of the data. 
It's generally impossible to find a value in the middle of a JSON string 
without examining every single byte that came before it.

The way to go faster is to use formats that let you jump through the structure 
faster. For example, tokenizing dictionary keys and encoding an object's keys 
as a fixed-width sorted list lets you look up a key in O(log n) time, and the 
constant factor is very small because you're comparing integers not strings.

I don't know if extracting 'classes' will help much in a query. The data will 
be smaller, and it makes it extremely fast to look up values if you know the 
class ahead of time, but in a query you don't know the class. Compared to what 
I described above, there's an extra step where you have to look up the class 
description from the object.

I also worry about use cases where the number of 'classes' becomes unwieldy, 
because the schema might be using a huge set of keys. For example, something 
where a customer order contains an object that maps SKUs to quantities, like 
{"A73563M1": 3, "A73522M0": 7, …}. And there are tens of thousands of SKUs.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Setting auto_vacuum=2 doesn't work after setting journal_mode=WAL

2020-02-21 Thread Jens Alfke

I just discovered that none of our databases are auto-vacuuming, even though we 
set auto_vacuum to 'incremental' when a database is created, and periodically 
call 'pragma incremental_vacuum'. If I go into the CLI, open a database and run 
"pragma auto_vacuum", it returns 0.

After some experimentation I've discovered that (at least in 3.28) `PRAGMA 
auto_vacuum=incremental` has to be done as the very first thing after creating 
a database, even before `PRAGMA journal_mode=WAL`. I know it's documented that 
auto_vacuum has to be enabled before creating any tables, but the docs say 
nothing about ordering wrt other pragmas!

To be precise, this is what we currently do after creating a new database, 
which does not work:

sqlite3_exec(db, "PRAGMA journal_mode=WAL; "
 "PRAGMA auto_vacuum=incremental; "// ⟵ will have no 
effect
 "BEGIN; "
 "CREATE TABLE …….. ; "
 "PRAGMA user_version=302; "
 "END;");

If I swap the first two statements, it does work:

sqlite3_exec(db, "PRAGMA auto_vacuum=incremental; "// ⟵ will take effect
  "PRAGMA journal_mode=WAL; "
  "BEGIN; "
 "CREATE TABLE …….. ; "
 "PRAGMA user_version=302; "
 "END;");

Is this expected? If so, the docs for auto_vacuum should be updated.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] more efficient JSON encoding: idle musing

2020-02-21 Thread Jens Alfke

> On Feb 21, 2020, at 4:20 AM, Wout Mertens  wrote:
> 
> I was wondering if the JSON extension could not do the same thing: for each
> table, keep a hidden stash of object layouts, and store the values as
> sqlite primitives. (you'd be able to disable this, in case the layouts
> rarely repeat)

We do something a bit like this in Couchbase Lite[1]. We don't directly store 
JSON, we store a binary data format we designed called Fleece, which has the 
same semantics but is faster to traverse and more compact.

https://github.com/couchbaselabs/fleece

In Fleece a JSON object is represented as an array of fixed-width {key, value} 
pairs.
- The keys are typically 16-bit integers that map into a global (per-database) 
table of common key strings.
- The values are either inline if they fit (small ints, boolean, null) or else 
offsets to where the real data is.
The array is sorted by key, so lookup is a binary search, usually on 16-bit 
ints.

Numbers are stored in a compact variable-width encoding. Strings are unescaped 
UTF-8, and multiple occurrences of the same string are only stored once.

(I've recently discovered that this is similar to FlexBuffers. Fleece has been 
around since 2015, so I don't know if the designers of FlexBuffers looked at 
it…)

I've thought about the 'layouts' concept but haven't tried implementing it yet. 
(Most JavaScript runtimes use something like it, btw.) It would only save about 
4*n bytes per record, where n is the average number of keys in a layout, and a 
small number of clock cycles per key lookup.

> On Feb 21, 2020, at 6:03 AM, Richard Hipp  wrote:
> 
> I experimented with a number of similar ideas for storing JSON when I
> was first designing the JSON components for SQLite.  I was never able
> to find anything that was as fast or as compact as just storing the
> original JSON text.

I had similar experiences with BSON, Bencode, etc. In my use case I found that 
the major expense wasn't parsing, rather allocating an object tree. So I 
designed the Fleece format to be fully useable in-place without requiring 
parsing. A secondary factor is that traversing objects and arrays is expensive 
because the items are variable width, so you have to parse through all the 
(potentially nested) items before the one you're looking for. That's why Fleece 
objects and arrays are fixed-width, using offsets to point to variable-size 
values.

—Jens

[1] https://www.couchbase.com/products/mobile
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] more efficient JSON encoding: idle musing

2020-02-21 Thread Jens Alfke

> On Feb 21, 2020, at 4:20 AM, Wout Mertens  wrote:
> 
> In JavaScript, objects are key-value collections with unique keys, where the 
> order of the keys is important.

JSON is not JavaScript. The order of keys is NOT significant in JSON, and many, 
many JSON implementations parse JSON objects into data structures* that do not 
preserve key order. Clients using such implementations cannot determine the 
original order of keys, nor can they re-serialize the object with the same 
ordering.

I have run into multiple issues over time with systems that assume JSON key 
ordering is significant and preserved (CouchDB and Scuttlebutt come to mind), 
which then later have to redesign protocols or data schema because of 
interoperability problems this causes.

—Jens

* The platform's standard "dictionary" / "map" class is typically implemented 
as a hash table or binary tree, as in Java, .NET, Objective-C, Swift, Go, Lua, 
C++. (I'm unsure about Python and Ruby.) A few platforms besides JS keep an 
auxiliary key array to preserve ordering. Erlang uses a linked-list, which is 
ordered.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Performance Issue on Large Table

2020-02-21 Thread Jens Alfke

> On Feb 20, 2020, at 5:24 PM, Chip Beaulieu  wrote:
> 
> CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`)
> CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`)
> CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses` 
> (`BibleID`,`Book`,`Chapter`,`VerseNumber`)

I believe the first two of those are unnecessary, since their column-lists are 
prefixes of the third. So removing them would buy you some performance.

(Disclaimer: not a guru.)

But you’ve said most of the problem is with FTS. When you update the database, 
does all of the indexed text really change? Or do many of the 30,000 new 
records contain the same text as their deleted counterparts? If the latter, you 
could optimize by not touching those rows.

It’s also possible that dropping the FTS table before the update, and 
re-creating it afterwards, would be faster than incrementally changing it.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Is randomblob(16) a good guid generation across multiple computers?

2020-02-20 Thread Jens Alfke

> On Feb 20, 2020, at 10:48 AM, Richard Hipp  wrote:
> 
> That assumption is not correct for SQLite, which does you a
> cryptographically strong PRNG.  And the SQLite PRNG is seeded from
> /dev/random on unix.

Not quite; I'm looking at the function unixRandomness() in SQLite 3.28. It's 
seeded from /dev/urandom, which on Linux "will produce lower quality output if 
the entropy pool drains, while /dev/random will prefer to block and wait for 
additional entropy to be collected." (I'm quoting the macOS man page, which 
goes on to say that on macOS it always returns high-quality randomness.)

I'm a bit doubtful about how the function handles errors, too.

* If /dev/urandom can't be opened, it instead cobbles together some decidedly 
low-entropy bytes from the results of the time() and getpid() system calls. 
IIRC this is very much like the way Netscape seeded its RNG in their first 
implementation of SSL, which turned out to be easily crackable by guessing the 
seed.

* If there's a read error on /dev/urandom, it just returns a buffer full of 
zeros, which is about as non-random as you can get.

Admittedly these are unlikely scenarios, but failure to properly seed an RNG 
can be catastrophic for security. And a lot of security exploits come from 
'unlikely' scenarios that an attacker finds a way to force.

There's a disclaimer about this, actually, inside the source code of 
sqlite3_randomness():

>   /* Initialize the state of the random number generator once,
>   ** the first time this routine is called.  The seed value does
>   ** not need to contain a lot of randomness since we are not
>   ** trying to do secure encryption or anything like that…

That's kind of at odds with your calling it a cryptographically strong PRNG. :(

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Virtual table function calls

2020-02-13 Thread Jens Alfke

> On Feb 13, 2020, at 12:52 PM, David Jones  wrote:
> 
>   sqlite> select F,G,H,attr(H,3) from summary;   # show formula used 
> to calculate column H.

Maybe pass the column name as a string, i.e. `attr('H',3)`? It sounds like your 
`attr` function needs to know the _identity_ of the column, not its contents, 
and the name is basically the identity.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Compiling SQLite Encryption Extension for ARM processor

2020-02-13 Thread Jens Alfke

> On Feb 13, 2020, at 10:51 AM, Subodh Pathak  wrote:
> 
> I am looking for help to configure machine to compile SEE for ARM. I am
> using Android mobile Samsung G7.

You have to use a cross-compiler, a version of GCC that runs on your platform 
but generates ARM-Linux code. 
Specifically, to build for Android you should be using the Android Native 
Development Kit (NDK). The page Dr. Hipp linked to will show you how to do that.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JSON_EACH + recursive query = unexpected performance degradation

2020-02-12 Thread Jens Alfke

> On Feb 10, 2020, at 10:27 PM, Хусаинов Динар  
> wrote:
> 
> Problem: the query takes 3000 ms (3 seconds) on my machine to complete. If I 
> create a real table with the SAME structure, insert the SAME data into it, 
> and run the SAME query, get the SAME result back, it takes 10-15 ms (200-300 
> TIMES faster).

It sounds like SQLite is much faster at traversing real tables than it is at 
traversing JSON arrays. Which I would expect. This is exacerbated by the fact 
that the recursive CTE must be doing a lot of traversals of the JSON.

You should be able to speed this up by creating temporary tables from the JSON 
first, and then changing the CTE to use those tables.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [EXTERNAL] Optimizer limitation with partial indexes

2020-02-12 Thread Jens Alfke

> On Feb 12, 2020, at 5:30 AM, Hick Gunter  wrote:
> 
> This is documented here https://sqlite.org/partialindex.html 
>  and here 
> https://sqlite.org/queryplanner.html 
> 
> Specifically, SQLIte does not prove theorems in first-order logic.

Thanks — I hadn't seen the section "Queries Using Partial Indexes" before, and 
it gives more detail about how the matching is done. However, it seems that my 
query does match one of the rules:

"If W [the query's WHERE clause] is AND-connected terms
 and X [the index's WHERE clause]  is OR-connected terms
 and if any term of W appears as a term of X,
 then the partial index is usable."

Here W = (expr1 > val1 OR expr2 > val2) AND expr3
 and X = expr3, which is a degenerate case of one OR-connected term.

So I'm not sure why the indexes aren't useable, unless there are limitations of 
the actual rule that aren't described in that English text.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Optimizer limitation with partial indexes

2020-02-11 Thread Jens Alfke

I'm running into a problem with partial indexes; apparently the query optimizer 
isn't smart enough.

I currently have indexes of the form
CREATE INDEX Index1 ON Table (expr1)
CREATE INDEX Index2 ON Table (expr2)
where expr1 and expr2 are expressions involving table columns.

The problematic queries are of the form
SELECT * FROM Table WHERE (expr1 > val1 OR expr2 > val2) AND expr3
Such a query correctly uses the above indexes — the EXPLAIN command shows it's 
using a multi-index OR combining two 'search table using index' loops.

If, however, I try to make the indexes smaller by changing them to
CREATE INDEX Index1 ON Table (expr1) WHERE expr3
CREATE INDEX Index2 ON Table (expr2) WHERE expr3
the query stops using the indexes effectively. It's reduced to doing 'scan 
table using index', i.e. O(n).

It looks like what happens is that the optimizer doesn't associate the "AND 
expr3" clause with the "expr1" and "expr2" comparisons. In other words, it 
doesn't realize that (A OR B) AND C is equivalent to (A AND C) OR (B AND C).

If this were a hand-written SELECT statement it would be easy to work around 
this, but it's not. It's the output of a query translator that generates SQL, 
and it can generate arbitrary queries with arbitrary combinations of operators.

I know the SQLite optimizer isn't a Mathematica-grade symbolic logic analyzer! 
But I'm wondering if in this case there's a way around this limitation?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Please increase the default for SQLITE_MAX_VARIABLE_NUMBER

2020-02-11 Thread Jens Alfke

> On Feb 11, 2020, at 2:10 AM, Digital Dog  wrote:
> 
> Thanks for enlighening again. It was just a thought. It seems it would be a
> lot of design and code to maintain the performance while preventing the
> original problem from happening. Not worth the trouble. But maybe
> increasing the parameter count to 10.000 would not hurt?

I ran into this a few months ago. I ended up just biting the bullet and 
constructing a SQL statement by hand, concatenating comma-separated values 
inside an "IN (…)" expression.

Yes, SQL injection is a danger. But if you're being bad in just one place, and 
you review that code, you can do this safely. SQLite's C API even has a 
function that escapes strings for you, and if you're not coding in C/C++, it's 
easy to write your own; basically
str ⟶ "'" + str.replace("'", "''") + "'"

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Make sqlite3 database searchable on Mac OS X

2020-02-11 Thread Jens Alfke

> On Feb 10, 2020, at 8:10 PM, Peng Yu  wrote:
> 
> It seems that sqlite3 databases are not searchable by Spotlight on Mac
> OS X. Is there a way to make them searchable? Thanks.

How would Spotlight know what tables or columns to index? It doesn't understand 
what database schema mean, and it can't tell a column with user-visible text 
apart from one with internal info.

You can write a Spotlight indexer plugin for your application that recognizes 
your file type and scans your SQLite database to generate indexing data.

The downside is that Spotlight is fundamentally file-based: it considers a file 
to be a single entity. So when the user searches for something that's in your 
database, it will just show the database file as a hit. It won't show separate 
results for every row that matches.

(Some apps work around this by creating stub files in a hidden directory, one 
file per entity. All these files need to contain is a record identifier so your 
indexer can index the file by looking at the corresponding row in the database. 
Spotlight will return the stub file as a match; when your app is told to open 
it, it looks at its record ID and displays the corresponding database row in 
its UI. Apple Mail is an example; there's a directory somewhere in 
~/Library/Mail that contains an empty file for every email message.)

—Jens

PS: Disclaimer: My knowledge of Spotlight is years out of date; it may be that 
it's advanced since then.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] loading extension csv.c

2020-02-07 Thread Jens Alfke

> On Feb 7, 2020, at 9:11 AM, chiahui chen  wrote:
> 
> /usr/include/sqlite3ext.h:437:53: note: expanded from macro
> 'sqlite3_vsnprintf'
> 
> #define sqlite3_vsnprintf  sqlite3_api->vsnprintf
> 
>   ~~~  ^
> 
> /usr/include/secure/_stdio.h:75:3: note: expanded from macro 'vsnprintf'
> 
>  __builtin___vsnprintf_chk (str, len, 0, __darwin_obsz(str), format, ap)
> 
>  ^

This appears to be your problem. The system header  is 
defining `vsnprintf` as a macro that expands to a compiler builtin. This is 
conflicting with a struct field named `vsnprintf` in the SQLite extension API.

I've never heard of  before, although it does exist in the macOS 
SDK. Looking through the normal , it does include that header at the 
end:

#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions.  */
#include 
#endif

So it looks like the trigger is that you're somehow building with 
_FORTIFY_SOURCE defined, and the others are not.

Anyway, I think you could work around the problem by editing csv.c and 
inserting something like this at the top:
#include 
#undef vsnprintf
Or else figuring out how to turn off _FORTIFY_SOURCE.

—Jens

PS: Your use of `gcc` in the command line confused me briefly — turns out `gcc` 
on macOS is simply an alias for `cc`, so it invokes Clang. If you really want 
GCC for some reason you'd have to install it yourself and put it in your $PATH 
before /usr/bin.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] notify all processes of database modification

2020-02-07 Thread Jens Alfke

> On Feb 7, 2020, at 6:23 AM, Kees Nuyt  wrote:
> 
> Anyway, SQLite doesn't have such a mechanism by itself.
> Maybe inotify is useful to you :
> 
> https://en.wikipedia.org/wiki/Inotify 
> http://man7.org/linux/man-pages/man7/inotify.7.html 
> 

Or on Apple platforms, FSEvents.

On any platform, you'd need to monitor both the main database and the .wal file.

And the notification would trigger soon after a transaction began making 
changes, although the changes wouldn't be visible to you until the commit, so 
you'd probably need to start polling until you see the changes, with some 
heuristic about timing out if nothing happens for a while (e.g. if the 
transaction is aborted.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] VTable Column Affinity Question and Change Request

2020-02-07 Thread Jens Alfke

> On Feb 5, 2020, at 6:56 PM, Keith Medcalf  wrote:
> 
>> It probably doesn’t parse that declaration or
>> figure out from it what the declared types of the columns are.
> 
> Actually it does, and this is documented for the sqlite3_declare_vtab 
> function -- and in fact the column names and affinities are parsed and stored 
> in the internal Table schema.  

Sorry, I meant that the _CSV extension_ doesn't parse the 'CREATE TABLE…' 
declaration to figure out what column affinities the caller desires.

SQLite probably stores the column affinities so they can be returned from APIs 
that request them; IIRC isn't there a C API call that tells you the affinity of 
a column?

> The column names are used but the executed VDBE program does not "apply 
> affinity".

I suspect this is for the same reason that led to the immediate objections to 
your proposal: it could hurt performance.

I think your proposal makes sense given that it doesn't slow anything down if 
the extension didn't specify any column affinities. My virtual table doesn't.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] VTable Column Affinity Question and Change Request

2020-02-05 Thread Jens Alfke

> On Feb 5, 2020, at 9:58 AM, Keith Medcalf  wrote:
> 
> It seems that "column affinities" are not respected in Virtual Table 
> implementations -- that is the value that is returned is the datatype 
> provided by the the vtab_cursor sqlite3_result_* function and the "column 
> affinity" from the vtab declaration is not applied.

The vtab implementation is responsible for generating the CREATE TABLE 
statement and passing it to sqlite3_declare_vtab(). It’s also responsible for 
returning column values. So I think the assumption is that it’s up to the 
implementation to be self-consistent, i.e. returning column values that match 
the declaration.

I haven’t used the CSV vtable. It looks as though its CREATE VIRTUAL TABLE 
statement takes a ‘schema’ parameter containing the SQL table declaration, that 
it then passes straight through to sqlite3_declare_vtab(). It probably doesn’t 
parse that declaration or figure out from it what the declared types of the 
columns are.

In other words this looks like a limitation of the CSV implementation, which is 
perhaps unusual in that it is not in control of the table schema it declares.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Compile error with SQLITE_OMIT_CTE (3.28)

2020-01-31 Thread Jens Alfke



> On Jan 31, 2020, at 9:55 AM, Jens Alfke  wrote:
> 
> If I define SQLITE_OMIT_CTE and compile the amalgamation (3.28.0), the build 
> fails due to two remaining calls to 
> sqlite3WithAdd() in the parser. This function is neither declared nor 
> implemented when SQLITE_OMIT_CTE is defined.

Never mind — I had somehow overlooked the bold italic "Important Note:" warning 
in the same section that lists the OMIT flags —

Important Note: The SQLITE_OMIT_* options may not work with the 
amalgamation. SQLITE_OMIT_* compile-time options usually work correctly only 
when SQLite is built from canonical source files.

Sorry for the noise :)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Compile error with SQLITE_OMIT_CTE (3.28)

2020-01-31 Thread Jens Alfke

If I define SQLITE_OMIT_CTE and compile the amalgamation (3.28.0), the build 
fails due to two remaining calls to 
sqlite3WithAdd() in the parser. This function is neither declared nor 
implemented when SQLITE_OMIT_CTE is defined.

  case 286: /* wqlist ::= nm eidlist_opt AS LP select RP */
{
  yymsp[-5].minor.yy59 = sqlite3WithAdd(pParse, 0, [-5].minor.yy0, 
yymsp[-4].minor.yy434, yymsp[-1].minor.yy457); /*A-overwrites-X*/
}
break;
  case 287: /* wqlist ::= wqlist COMMA nm eidlist_opt AS LP select RP */
{
  yymsp[-7].minor.yy59 = sqlite3WithAdd(pParse, yymsp[-7].minor.yy59, 
[-5].minor.yy0, yymsp[-4].minor.yy434, yymsp[-1].minor.yy457);
}

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] New word to replace "serverless"

2020-01-28 Thread Jens Alfke

> On Jan 28, 2020, at 3:18 AM, Richard Hipp  wrote:
> 
> Things like MySQL-embedded and H2 run a "server" as a thread instead
> of as a separate process.  ...  So this is really the
> same thing as a server using IPC except that the server runs in the
> same address space as the client. 

I see that as a mere implementation detail, since it doesn't affect the way the 
developer configures or uses the system. Running an engine on a background 
thread(s) is an increasingly common technique as CPUs become more concurrent, 
especially in mobile apps where it's forbidden to 'jank' up the GUI by blocking 
the main thread.

For example, on iOS and macOS the HTTP client library (NSURLSession) and the 2d 
graphics compositing engine (CoreAnimation) do all their heavy lifting on 
background threads, but I wouldn't think of calling either of them server-based.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] New word to replace "serverless"

2020-01-27 Thread Jens Alfke

> On Jan 27, 2020, at 2:18 PM, Richard Hipp  wrote:
> 
> But more recently, "serverless" has become a popular buzz-word that
> means "managed by my hosting provider rather than by me."  

I hate this buzzword. It's especially confusing because peer-to-peer 
architectures are also validly described as serverless.

> How can I fix this?  What alternative word can I use in place of
> "serverless" to mean "without a server"?

Don't change it. The term is totally accurate for describing SQLite, and the 
other terms people are suggesting are IMHO less clear.

Rather, I would add a blurb to the "SQLite Is Serverless" web page, clarifying 
that you are using the original common-sensical definition of the word, not the 
current buzzword.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BUG(?) in FTS5

2020-01-23 Thread Jens Alfke

> On Jan 23, 2020, at 6:47 AM, mailing lists  wrote:
> 
> The following SELECT statement fails with the error "unable to use function 
> MATCH in the requested context":

This is an annoying but documented limitation of FTS, not a bug. The MATCH 
operator can’t be used inside an OR expression. It has to be at top-level or in 
an AND.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Request: Allow virtual tables to make use of partial indexes

2020-01-16 Thread Jens Alfke

I believe I've found another limitation for efficient querying of virtual 
tables. The xBestIndex call communicates column constraints, but it doesn't 
specify whether a constraint's value is known at compile time, nor pass such a 
compile-time value to xBestIndex.

This means that the virtual-table implementation can't make use of partial 
indexes in its underlying storage, because it doesn't know whether the index's 
predicate is satisfied.

For example, say my data store is FooCabinet, and it has a FooCabinet index* on 
the "cost" column where the "type" column is equal to "expense".
Consider these two queries:
SELECT cost FROM myvtable WHERE cost > ? AND type = 'expense';
SELECT cost FROM myvtable WHERE cost > ? AND type =?;
The first query can make use of the index; the second can't.
But in either case, my xBestIndex function is called with 
SQLITE_INDEX_CONSTRAINT_GT on "cost" and SQLITE_INDEX_CONSTRAINT_EQ on "type", 
and that's all I know. I can't tell whether the index is useable, so I can't 
use it.

Consider this an enhancement request to extend the sqlite3_index_info struct 
somehow to convey compile-time column constraint values to the extension, 
perhaps by adding a field
const sqlite3_value** const constraintValues;

Thanks,

—Jens

* Obviously we are not talking about SQLite indexes here! My underlying data 
store has its own partial- index functionality.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Best way to store key,value pairs

2020-01-13 Thread Jens Alfke

> On Jan 13, 2020, at 1:45 PM, James K. Lowden  wrote:
> 
> So, basically, a nomalized design requires too much use of INSERT?  
> You're making an efficiency argument here, or maybe
> ease-of-implementation assertion. For me, inserting one header row or
> 20 is the same coding effort (still need a loop).

I was suggesting _zero_ header rows, i.e. store the headers as a blob column in 
the messages table, instead of having a separate table.

> I think transaction throughput would be about the same if COMMIT is applied 
> only to whole messages.  

There are other pain points. From what I've seen in the past, IIRC one of them 
is managing the foreign-key constraints (e.g. SQLite finding and deleting 
header rows when a message row is deleted), and the necessity of using JOINs to 
get at the headers.

Anyway: I merely said to "consider encoding the headers as JSON". Certainly 
didn't mean to imply it was the canonical answer.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Best way to store key,value pairs

2020-01-13 Thread Jens Alfke

> On Jan 12, 2020, at 4:12 PM, James K. Lowden  wrote:
> 
> What is the motivation behind this advice?  It's completely unnecessary.  

Thanks for your opinion, James! I disagree. 

RFC822 headers are schemaless, and in a Usenet or email database they have 
rather high volume (probably 20+ per message) but go mostly unused. An NNTP 
server can't simply throw away the headers it doesn't need, but IMHO there are 
too many of them to do the usual SQL thing and add every header of every 
message to a table — that multiples the number of inserts and deletes by an 
order of magnitude.

> For all the complexity JSON adds to the design, it adds exactly no power: 
> precisely the same effects can be achieved without it.  

Well sure, but you could say the same thing about indexes, couldn't you?

I disagree about complexity. Encoding headers as JSON is pretty simple if 
you've got a JSON encoder handy, simpler than executing a SQL INSERT. And the 
JSON API is very easy to use.

> I see no advantage to introducing JSON to a system with no external use for 
> it.  

Hm, a number of database companies (such as my employer, Couchbase) and their 
customers would disagree with you :)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Test failures on GPFS

2020-01-11 Thread Jens Alfke

> On Jan 11, 2020, at 2:58 PM, T J  wrote:
> 
> I was interested in using sqlite over GPFS.

The standard advice on using SQLite over a network file system is “don’t do 
it.” Even if you find the rare file system that handles locks properly, you’ll 
likely have performance issues.

A client/server database like Postgres or MySQL is a better fit for a 
distributed use case. If you’re sending everything over the network, it makes 
more sense to send just the queries & results, not the innards of the b-tree 
too. Is there a reason you can’t use one of those?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Best way to store key,value pairs

2020-01-11 Thread Jens Alfke

> On Jan 8, 2020, at 3:13 AM, Simon Slavin  wrote:
> 
> I believe Jens' point is valid, as long as you don't have to search/scan 
> headers.

You can even do that — to search for a specific header’s value, just create an 
index on json_extract(headers, ‘$Header-Name’), then in a query use that same 
expression in an equality or relational comparison.

(You’ll need to normalize the case of header names during the JSON conversion 
while inserting, since JSON keys are case-sensitive but RFC822 header names 
aren’t.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Best way to store key,value pairs

2020-01-07 Thread Jens Alfke

Consider encoding the headers as JSON and storing them in a single column. 
SQLite has a JSON extension that makes it easy to access values from JSON data 
in a query. You can even index them.

Simon’s suggestion (a row per header) is correct in theory, but the large 
numbers of headers you’ll be storing in an NNTP database will make that 
approach pretty expensive, I think. 

(I’ve mostly given up on relational-database orthodoxy, and doing so helped 
make version 2 of my program about 5x faster than version 1.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Jens Alfke

> On Jan 2, 2020, at 11:54 AM, Doug  wrote:
> 
> I know there has been a lot of talk about what can and cannot be done with 
> the C calling interface because of compatibility issues and the myriad set of 
> wrappers on various forms. I’m having a hard time letting go of a possible 
> 25% performance improvement.

This was a heavily multithreaded benchmark (64 threads accessing the same 
connection) on a very hefty server-class CPU. From Dr Hipp’s results, it sounds 
like the speed up may be only in similar situations, not to more normal SQLite 
usage.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] New SQL functions available for testing only.

2020-01-01 Thread Jens Alfke

> On Jan 1, 2020, at 8:03 AM, Richard Hipp  wrote:
> 
> expr_implies_expr(A,B)
> 
> Return non-zero if expression A implies expression B.  See
> https://www.sqlite.org/src/artifact/971e442fd5?ln=5277-5299 for
> details.  This routine is used to determine if a partial index is
> usable.

Thanks! This is also likely useful for programs that create their own synthetic 
indexes out of secondary tables. (I do this, partly because I need to index 
multiple values per row, e.g. indexing the elements of a JSON array.) If such 
an ‘index’ is partial, I need to figure out when it’s applicable to a query.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Causal profiling

2019-12-31 Thread Jens Alfke

> On Dec 31, 2019, at 7:03 AM, Richard Damon  wrote:
> 
> 1) The C API between separate compilations is very well established, and 
> fully documented in most environments.

Agreed. APIs between library boundaries generally need to be C. This is pretty 
easy to do though (I've done it multiple times in my current job.) You just 
write some C wrappers for the public methods.

> On Dec 31, 2019, at 7:19 AM, Simon Slavin  wrote:
> 
> 5) SQLite has to work on your set top box.  On your Wifi base station.  On 
> your Home hub.  On the machine that runs the car park.  All these things have 
> C compilers.  Not all these things have C++ compilers.

This may have been an issue ten years ago, but is it still? I did some R on 
embedded systems last year, and C++ support looked pretty ubiquitous. The 
heftier embedded boards run Linux, the middleweight ones have C++-friendly 
environments like mbedOS or ESP32-IDF, and even tiny 8-bit Arduino 
microcontrollers have an OO C++ API, even though they call it C to avoid 
scaring the newbies.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Causal profiling

2019-12-31 Thread Jens Alfke

> On Dec 30, 2019, at 6:23 PM, Alexander Vega  wrote:
> 
> -fdevirtualize
> -fdevirtualize-speculatively

I believe those are C++ optimizations, specific to C++ virtual methods. What 
SQLite is doing is similar to a class with virtual methods, but done “by hand” 
in plain C, so the optimizer doesn’t know how to apply that optimization.

(I’m fighting the urge to write an off-topic rant about why anyone still 
programs in C when they could write roughly the same code much more simply and 
cleanly in C++ using only the most basic features like classes and methods. 
I’ve spent too much time lately trying to figure out or debug hellacious C 
spaghetti code that’s awkwardly reinventing wheels like inheritance and 
constructors/destructors.)

And in any case, de-virtualization only works when a virtual method has only 
one implementation, I.e. is monomorphic, and that isn’t true here.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Causal profiling

2019-12-30 Thread Jens Alfke

> On Dec 30, 2019, at 7:19 AM, Doug  wrote:
> 
> I am suggesting that if the we added the global calls to the underlying 
> functions to the API - that is, the functions that are called by the function 
> table indirection - then one could code the application to call the 
> underlying functions.

What you're describing is basically the SQLITE_CONFIG_SINGLETHREAD compile 
option. In that configuration SQLite does not use any mutexes, and you're 
responsible for "call[ing] the underlying functions" (the mutex lock/unlock 
functions) yourself.

However, in this configuration any client code that makes concurrent calls to 
SQLite — and there is a lot of code like that — will crash and burn.

> If you are going to tell me that you need to maintain two versions of your 
> application if you run it in a single-thread environment or a multi-thread 
> environment

I don't know if someone else was saying that; what I was pointing out earlier 
is that many clients of SQLite link to it as a shared library provided by the 
OS, either built-in (as on iOS and macOS) or installed by a central package 
manager (Linux). This means they don't specify compile-time options, they 
implicitly get the behavior configured by the OS or package vendor, which I'd 
guess is probably MULTITHREAD. If the SQLite they link with changes its 
behavior to avoid mutexes, many of these programs will break.

> Thinking about it, I'm surprised that the C API isn't just a set of macros 
> already.

What you're describing is basically the effect of link-time optimization (LTO). 
The optimizer runs over the entire program's compiled code (not just one 
compilation unit) and applies optimizations such as inlining. This can indeed 
have a big impact.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Causal profiling

2019-12-27 Thread Jens Alfke

> On Dec 26, 2019, at 3:45 PM, Doug  wrote:
> 
> Can you explain what the API is that you use to "set concurrency levels per 
> connection", please? Is it a parameter on open() or its own function call? 
> And how would the API break, exactly?

sqlite3_config(), specifically the SQLITE_CONFIG_SINGLETHREAD, 
SQLITE_CONFIG_MULTITHREAD, SQLITE_CONFIG_SERIALIZED options.

This API would break because configuring those options at runtime would have no 
effect on behavior; the only thing that would change threading behavior would 
be the compile-time flags SQLITE_MUTEX_OMIT, etc.

(This is actually global, not per-connection, but that doesn't invalidate what 
I said.)

> The talk suggested removing the SQLite virtual table of functions 
> (specifically the call to free a mutex). The user calls the function 
> directly. How does that break an API?

If SQLite's implementation directly called the mutex lock/unlock functions, 
instead of indirecting, then there would be no way to control whether or not 
mutexes were used. In other words, it would be impossible to change any of the 
above options at runtime.

> The talk suggested removing the SQLite virtual table of functions 
> (specifically the call to free a mutex). The user calls the function 
> directly. How does that break an API?

No, the user does not call those functions directly. The code shown in the 
video is deep inside SQLite itself and not visible through the API. (You say 
you're using a TCL wrapper … so you may not be aware of what is or isn't in the 
C API. Trust me, I use the C API a lot.)

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Causal profiling

2019-12-26 Thread Jens Alfke

> On Dec 25, 2019, at 2:53 PM, Doug  wrote:
> 
> I wrote an application in Qt which uses SQLite. Therefore, I invoke SQLite 
> functions with some wrapper. For a 9% performance improvement in SQLite using 
> the direct call versus indirect call (as discussed in the talk), cannot the 
> wrapper functions be changed so my application doesn't know the difference?  

This change would break the API that lets you set concurrency levels per 
connection; instead, the concurrency would be hardcoded at compile time. _You_ 
may not be using this API, but there are definitely developers who do.

This would especially be a problem for environments where SQLite is provided as 
a shared library in the operating system (e.g. iOS, macOS). The concurrency 
level would then be up to the platform vendor, not the application developer. 
(Unless they bundle their own copy of SQLite in their app.)

One possible workaround would be a compile option that enables the direct 
calls, but which is off by default. People who wanted mutexes but with direct 
calls could then set that option.

> On Dec 25, 2019, at 3:25 PM, Richard Hipp  wrote:
> 
> By "untestable" I mean that we would be unable to obtain 100% MC/DC
> (essentually 100% branch test coverage) in an SQLite compiled as for
> delivery.

Because you'd only be testing one concurrency mode? But there are plenty of 
existing compile-time options in SQLite; don't you have to test multiple builds 
of the library to test those? How would this one be different?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Compiling SQLite without the database storage?

2019-12-21 Thread Jens Alfke

This may sound crazy, but is there a way to compile SQLite without its B-tree 
and table code? This would be for a use case with _only_ virtual tables — i.e. 
SQLite being used as a SQL query engine on top of a different data store*. It 
would be nice not to drag in too much unused code.

—Jens

* Yes, I happen to be experimenting with LMDB...
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Multi layer JSON query

2019-12-20 Thread Jens Alfke

> On Dec 19, 2019, at 7:45 PM, No.1 Perfect <757171...@qq.com> wrote:
> 
> How can I count the amountand numof goods when the customer or 
> goodsname is different.

Querying the contents of arrays is kind of complicated. SQL doesn't understand 
arrays, so the query has to use a "table-valued function", json_each, that 
makes the array appear to be a table where each item is a row. Then you can use 
a JOIN to query that "table".

The documentation has examples: https://sqlite.org/json1.html#jeach 

—Jens

PS: Your email program is turning nonbreaking spaces into "", which makes 
your message hard to read. Please try to fix that.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] mysql -> sqlite problem

2019-12-19 Thread Jens Alfke

> On Dec 19, 2019, at 4:49 PM, Mike Bentley  wrote:
> 
> Is it true that "Each index name must be unique in the database"?

Yes. You're right that the docs don't seem to spell this out, but since the 
syntax to delete an index is "DROP INDEX ", you couldn't have 
multiple indexes with the same name because it wouldn't know which one to 
delete.

You can pretty easily get a unique index name by prefixing the name of the 
table.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Slightly offtopic: indexes vs. hash tables

2019-12-13 Thread Jens Alfke

Thank you, Simon. That was an interesting article, and even better, it has a 
link to a book-length PDF of “Modern B-Tree Techniques” that is *solid gold*. 
I’ve been wanting to learn more about b-trees, indexing, query planning, etc. 
and this book goes way beyond anything I’ve found previously.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.219.7269

—Jens 

> On Dec 9, 2019, at 10:33 AM, Simon Slavin  wrote:
> 
> During the lower-volume weekdays, I beg you indulgence for another off-topic 
> post.
> 
> 
> 
> This article contrasts hash tables vs. indexes, in an attempt to explain why 
> indexes are the basis of most DBMSes but hash tables are the basis of many 
> in-memory search systems.
> 
> It's a semi-technical article, but if you don't understand what "O(log n)" 
> means, don't worry, you can skim over that level and still understand the 
> discussion.
> 
> SQLite uses hash tables internally, though not for data stored in tables.  
> But given there's already an embedded LSM library, it's not impossible that 
> SQLite might use hashing for some data purpose in the future.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Coding standard

2019-12-13 Thread Jens Alfke

—Jens 

> On Dec 12, 2019, at 11:23 AM, Warren Young  wrote:
> 
> I wouldn’t dismiss this warning

I wouldn’t dismiss a warning about the full scenario. (In fact I wasn’t aware 
that assignment to a field might overwrite pad bytes; that’s good to know.)

But warning about every call to memset is counterproductive, because it’s much 
too noisy. Memset is used often in situations other than zeroing padded 
structures. There are common ways to zero structs that don’t involve memset— 
like initializing it with “= {}” or using calloc to allocate one on the heap. 
And probably 99% of the time a struct is zeroed, it’s not going to be passed 
across a trust boundary.

It’s kind of like your mom warning you every time you get on your bike, because 
one time a kid rode their bike up to the old quarry and went swimming and 
drowned.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Coding standard

2019-12-12 Thread Jens Alfke

> On Dec 12, 2019, at 10:36 AM, Simon Slavin  wrote:
> 
> For instance CodeSonar reports every use of memset() because you /can/ leak 
> uninitialised bits of memory using memset() (CERT C Section 3.6 DCL39-C).
> But it has no way to check whether what you're doing with memset() does 
> initialise all bits.

That seems like a silly warning to produce. Reading the CERT section in 
question*, the real  issue is caused not by memset but by writing to a field of 
a struct AFTER zeroing the struct with memset, AND only if the struct field has 
padding, AND only if the compiler optimizes the write in a particular way, AND 
only if the struct is then copied to a separate trust domain.

I don't know anything about CodeSonar, but this particular warning sounds like 
it was added only because it's easy to implement with 'grep'; whereas a useful 
warning that detects the actual situation described by CERT requires something 
on the order of valgrind or the Clang Address Sanitizer, to detect which bytes 
within the struct actually are garbage.

—Jens

* 
https://resources.sei.cmu.edu/downloads/secure-coding/assets/sei-cert-c-coding-standard-2016-v01.pdf

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite sync over network

2019-12-12 Thread Jens Alfke

> On Dec 9, 2019, at 7:43 AM, George  wrote:
> 
> (litesync - SQLite Replication and Synchronization) 
> litesync seems to be a version of sqlite made by Ramos Bernardo from Brazil.

Interesting, but I don't see how they can make those kinds of claims without 
running into conflicts on primary keys. They say you can't use AUTOINCREMENT, 
for obvious reasons, but don't describe how else primary keys would be assigned.

The whole thing appears to be closed-source only, and from a developer I've 
never heard of, so I'd be cautious and build some software to try to evaluate 
it before you plunk down money.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite sync over network

2019-12-05 Thread Jens Alfke

> On Dec 5, 2019, at 8:05 AM, George  wrote:
> 
> Changes -> bidirectional. All terminals can save to db. Amount of collected 
> data it's not big and frequency between readings will be minimum 2sec. When 
> we push more readings. Data itself it's very small. Like reading temperature 
> from a probe.

The product I work on — Couchbase Mobile* — does this, but it may not be 
exactly what you want because it uses SQLite internally on the client side, and 
not at all on the server. So it's not something you can plug your existing 
SQLite-based code into. (Our data model is not relational but JSON-based.)

Data sync is hard. I've been working in this area since 2011 so I think I have 
some expertise here :)
Problems you're likely to run into:

1. You can't use regular integer primary keys, because different clients will 
end up creating rows with the same keys and cause collisions when they sync. 
You either have to use UUIDs as keys, or else somehow partition the key space 
in a deterministic way, like prefixing a fixed client ID to a key.

2. If multiple entities can update the db, there will likely be conflicts. 
Conflicts are hard to manage, and how you do it is entirely dependent on your 
high-level schema. In the worst case, conflicts require human intervention.

3. You need a message-oriented protocol. It's best to keep a continuous 
bidirectional connection open. WebSockets is a good protocol for this. You'll 
have to deal with errors establishing the connection, and unexpected 
disconnects, by periodic retries.

4. Schema upgrades in a distributed system are a mess. If your system is 
centralized enough you can take it down and upgrade every peer's database, then 
bring it back up, but of course that doesn't work in a decentralized system. 
(One of the main reasons Couchbase is schema-less.)

5. Keep in mind there is no central source of truth. Data takes finite time to 
propagate, and transient errors greatly increase that time. Even if you have a 
central server, it will be behind the clients that create the data, so it 
doesn't have the latest info. No one does.

—Jens

* https://www.couchbase.com/products/mobile
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] virtual tables vs. expression-based indexes

2019-11-27 Thread Jens Alfke

I'm considering using a virtual table to query an external-to-SQLite data 
store. However, I've carefully read about the xBestIndex method, and it appears 
that virtual tables cannot have indexes on expressions; or rather that the 
SQLite query engine can't make use of such indexes, only indexes on columns.

Consider for example a virtual table with a column named "text", and a query 
with `WHERE length(text) > 100`. In my external data store I can create an 
index on `length(text)`, but it doesn't look as though SQLite has any way of 
asking me about it, so I assume it will just brute-force scan through every row.

The only workaround I can see is to add a virtual table column for every 
possible expression that might be queried against — like "text_length" — but 
the query interface in my project is open-ended enough that I can't delimit the 
set of expressions that might need to be exposed this way. (It might be 
feasible if I could alter the table on the fly to add columns as needed, but 
the docs explicitly say I can't do that.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Concurrency Question

2019-11-23 Thread Jens Alfke


> On Nov 23, 2019, at 7:17 AM, Dan Kennedy  wrote:
> 
> This should only happen if you are using shared-cache mode. Don't use 
> shared-cache mode.

Shared-cache mode also breaks Isolation between connections — during a 
transaction, other connections will see the writer’s intermediate state. (IIRC. 
It’s been a few years.)

In my experience, it’s only useful if all connections are read-only, or if 
you’re willing to use your own mutexes to keep writers from screwing up readers 
(in which case you might as well just share a single connection, right?)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Request: Combining skip-scan with 'max' optimization

2019-11-21 Thread Jens Alfke

I'm following up on my "Optimizing `SELECT a, max(b) GROUP BY a`" thread from a 
few weeks ago, rephrasing it as a clearer enhancement request.

ACTUAL BEHAVIOR: A query of the form `SELECT a, max(b) GROUP BY a` runs slowly 
(O(n) with the number of table rows), even if there is an index on (a, b DESC). 
The query plan explanation says "SCAN TABLE ... USING INDEX". This is in SQLite 
3.28.

EXPECTED BEHAVIOR: Query runs faster :-) My big-O fu is not strong enough to 
express it that way, but I'd imagine it to be proportional to the number of 
distinct `a` values, not the number of rows in the table.

DIAGNOSIS: According to Keith Medcalf, "it appears that the optimizer will not 
utilize a skip-scan *AND* apply the max optimization concurrently."

According to Keith, a workaround is to rewrite the query as
select name, 
  (
   select max(timestamp) 
 from table 
where name=outer.name
  ) 
 from (
   select distinct name
 from table
  );

This is of course a lot more complex. And unfortunately in my case the query 
generator my program uses does not (yet) have the capability to generate nested 
SELECTs, so the optimization is unavailable to me until/unless we implement 
that.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] What is the C language standard to which sqlite conforms ?

2019-11-21 Thread Jens Alfke

> On Nov 21, 2019, at 7:01 AM, Richard Hipp  wrote:
> 
> The memset() just forces the bug to the surface in builds where the
> ckmalloc()/ckfree() routines of TCL are using caching that prevents
> valgrind/ASAN from seeing the use-after-free.  The memset() is not
> part of the bug fixx itself, but is a preventative measure to try to
> prevent similar bugs in the future.

This looks wrong to me:

  memset(p->pVfs, 0, sizeof(sqlite3_vfs));
  memset(p, 0, sizeof(Testvfs));
  ckfree((char *)p->pVfs);
  ckfree((char *)p);

The second line zeroes the Testvfs struct pointed to by p;
the third line reads the pVfs field of the struct, which is now NULL,
and then calls free() on that NULL pointer, which is a no-op.
The net result is to leak the heap block pointed to by p->pVfs.

Shouldn't the second and third lines be swapped?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] What is the C language standard to which sqlite conforms ?

2019-11-19 Thread Jens Alfke



> On Nov 19, 2019, at 5:29 AM, Dennis Clarke  wrote:
> 
> Yes I have tried gcc 9.2.0 and the whole process fails in the tests
> and no it will not compile as C90 code.

Have you tried just not forcing strict compliance? Which is the way people 
normally build SQLite?

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] What is the C language standard to which sqlite conforms ?

2019-11-18 Thread Jens Alfke

> On Nov 18, 2019, at 2:44 PM, Dennis Clarke  wrote:
> 
>  On some machines with different compilers I get good
> results using C99 strict compliance. On other machines, such as those
> running Red Hat Enterprise Linux, I get terrible results.

Why does it matter to you? I usually worry about compliance in my own code, not 
in 3rd party code that's known to work.

The parts of SQLite source code I've seen seem to be C90, in that they don't 
use "//" comments and put all their variable declarations at the top of a 
function. But I have no idea whether it's expected to conform to that or not.

> On other machines, such as those running Red Hat Enterprise Linux, I get 
> terrible results.

What kind of results? RHEL tends to have old versions of everything, so it may 
be that its version of Clang/GCC has bugs with checking language compliance.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Jens Alfke

> On Nov 11, 2019, at 11:09 AM, Jose Isaias Cabrera  wrote:
> 
> Compared to me, you are a genius in everything, but you just lack a little 
> bit of understanding about other languages and their localization behavior.  
> As a Technical Project Manager for 12 years on my last job, all of these 
> statements that I am making, and yours above, were part of my daily routines. 

Well, I did spend years at Apple working on several GUI apps (iChat, Address 
Book, etc.) that were, of course fully localized. So yes, I understand what 
you're saying here. But doesn't it prove my point? You'd never just arbitrarily 
decide to display 20 characters in some view in a real app. Instead you'd check 
the bounds of the view at runtime and render text to fit in that. Or going the 
other way, you'd compute the width of the text in the specified font, then 
resize the view to fit it.

The details of the correspondences between bytes, code points,  characters, and 
glyphs are really complex. Using UTF-32 only helps with the first of those 
mappings; you still have to pay attention to the rest. A 4x increase in space 
usage just isn't a good trade-off for such a limited benefit.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Jens Alfke

> On Nov 11, 2019, at 9:39 AM, Jose Isaias Cabrera  wrote:
> 
> However, space is cheap now

It isn't. A sizable fraction of all software development is done for devices 
with under a megabyte of RAM. (IoT and embedded are huge markets.)  And 
remember, we're talking on the email forum for a library that's heavily used on 
that scale of hardware.

And even on bigger systems, L1 and L2 caches are small: Intel's i7 Haswell CPUs 
have 32KB and 512KB per core, respectively. Remember, "overflowing cache" is 
the new "VM thrash" — RAM is absurdly slow compared to CPU speeds.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Jens Alfke

> On Nov 11, 2019, at 9:30 AM, Jose Isaias Cabrera  wrote:
> 
> Say that I am writing a report and I only want to print the first 20 
> characters of a string, that would be something like,

Hang on — why exactly 20 characters? Of text in an arbitrary language, which is 
to be displayed in an arbitrary font, with an arbitrary line width?

I don't know about you, but the only time I think about "exactly 20 characters" 
is when I'm writing to a terminal window. Most of the time that's ASCII, and 
even if it isn't, Asian characters are going to be double-width, and emojis 
might render an arbitrary number of code points to a single-width graphic. So 
"20 characters" still doesn't map to a fixed width onscreen.

If I'm writing a report, I more likely want to render only the first _line_ of 
a string, i.e. enough text to fill some number of points/millimeters/pixels of 
space, possibly minus an ellipsis. That is a job for a text rendering library, 
not `writeflen`.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Jens Alfke

> On Nov 11, 2019, at 7:49 AM, Jose Isaias Cabrera  wrote:
> 
> if you want to count characters in languages such as Arabic, Hebrew, Chinese, 
> Japanese, etc., the easiest way is to convert that string to UTF32, and do a 
> string count of that UTF32 variable.

No, the easiest way is to ask your string class/library what the character 
count is, and let _it_ deal with the fiddly details. 

Or to consider why you need the character count in the first place — it’s 
usually not something that’s useful to know. Usually what you’re really asking 
is “how many pixels wide will this render?” or “how many bytes will this 
occupy?” or even “let me iterate over each character”.

At a low level, UTF-8 makes a lot more sense. It’s very compact, which is 
important for cache coherency as well as storage space. It’s upward compatible 
with ASCII, which is extremely convenient for text-based protocols / file 
formats / languages, and for working with legacy APIs (like !)

Modern libraries seem to be moving to UTF-8. For instance, Apple’s been 
migrating Swift’s string class from a legacy UTF-16 encoding to UTF-8, and 
playing up the consequent performance and space win. Go has been UTF-8 from the 
start. I don’t know of a single library that’s gone with UTF-32, except maybe 
as an option.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-10 Thread Jens Alfke


> On Nov 10, 2019, at 4:03 AM, Richard Damon  wrote:
> 
> Actually, 'The Artist whose name formerly was Prince' (which wasn't his
> name, his legal name was an unpronounceable pictograph), breaks every
> computer system I know.

Unicode Character PRINCE (U+1F934)
https://www.fileformat.info/info/unicode/char/1f934/index.htm

Oh wait, wrong Prince...

There’s always this:
https://parkerhiggins.net/2013/01/writing-the-prince-symbol-in-unicode/

But all kidding aside, databases are created to serve people, not the other way 
around. To declare that a database schema is the truth is absurd.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

2019-11-09 Thread Jens Alfke

On Nov 9, 2019, at 1:09 PM, sky5w...@gmail.com wrote:
> 
> In this case, data modelers hoping to save a column. arrggg.
> It flies in the face of data normalization and pushes the problem down the
> line.

But you _cannot_ normalize people’s names; that’s the exact point of that 
article. Anything you assume about the structure of a name will be wrong in 
some culture.

-Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Reading a SharePoint file

2019-11-08 Thread Jens Alfke

> On Nov 8, 2019, at 12:21 PM, Jose Isaias Cabrera  wrote:
> 
> Yeah, that is what I am doing now.  I was trying to save time to just be able 
> to read a few tables and see if I needed to update it, so then, download it 
> and upload it.  But now, I have to download it, and read it, and delete the 
> local one. Thanks.

SharePoint might support WebDAV, and most OSs support (or used to support) 
mounting WebDAV as a filesystem. In that case you could point SQLite at the 
database in WebDAV mount. At that point it depends on whether the filesystem 
handles file reads by issuing HTTP byte-range requests, or if it just downloads 
the entire file to a local cache file first. In the former case it'd be faster 
than downloading, in the latter case not.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Reading a SharePoint file

2019-11-08 Thread Jens Alfke

> On Nov 8, 2019, at 11:57 AM, Jose Isaias Cabrera  wrote:
> 
> Is there any way that SQLite can read a file on a Sharepoint site? ie 
> https://some.site.com/sites/ROC/Docs%20comp/Shared.Projects_DB.sqlite3
>  
> 
>  Thanks.

Download the file over HTTP and then open the local file with SQLite, is the 
obvious answer.

If you really wanted to you could write custom code to download from HTTP into 
memory and then open it as an in-memory database, but that doesn't seem to have 
any real advantages.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLITE_DETERMINISTIC and custom function optimization

2019-11-08 Thread Jens Alfke

> On Nov 8, 2019, at 11:08 AM, Mario M. Westphal  wrote:
> 
> The EXPENSIVE_FUNCTION function is referred multiple times in the update
> statement. But it always returns the same result (for any given row).

There was a similar thread (that I started, I think) from two years ago with 
subject "Common subexpression optimization of deterministic functions". It's 
worth a read.

The takeaway is that SQLite's query optimizer does not remove redundant calls 
to a deterministic function. IIRC, there weren't any good workarounds for this.

… But this makes me think of the upcoming virtual column feature. If you define 
a virtual table column whose value is equal to EXPENSIVE_FUNCTION(), do 
multiple references to that column in a query cause multiple calls to the 
function, or is it computed only once per row?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite with branching

2019-11-07 Thread Jens Alfke

> On Nov 7, 2019, at 9:02 AM, Bernardo Ramos  wrote:
> 
> If you are interested in just the performance without the branching feature, 
> there are at least 3 options:
> 
> 1. SQLigthning: I was thinking in updating it to the last version of SQLite

That would be awesome! I have looked at it a few times, but it's based on such 
an old version that it's useless to me in its current state.

> 2. Modified version of LiteTree, without branches

I'm curious how performance of LiteTree (w/o branching) compares to 
SQLightning, i.e. whether storing rows or pages is more efficient. Have you 
measured?

> 3. SQLite with mmap

IIRC, this is not nearly as fast as either of the LMDB-based approaches. I 
don't know why; presumably SQLite doesn't make as efficient use of 
memory-mapping.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Limit on number of columns in SQLite table

2019-11-06 Thread Jens Alfke

> On Nov 6, 2019, at 9:25 AM, Digital Dog  wrote:
> 
> If there are millions or billions of rows
> in the data set I consider it big data and the only reasonable format for
> storing it is a column store format.

There are many types of stores for "big data". My employer, Couchbase, has 
customers who would disagree with you; they use our document database to store 
millions or billions of rows.

It depends on the data type. For highly structured data with many sparse 
columns you may be right. For less structured data a key-value or document 
store is better. I'm sure there are other types of big-data storage I'm unaware 
of.

In the original poster's case, it didn't seem like the data set really had 
zillions of columns, since [IIRC] they didn't need to be queried separately. 
You could put something like that in a key-value store with each value an 
encoded C array, for example.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite with branching

2019-11-05 Thread Jens Alfke

> On Nov 5, 2019, at 1:27 AM, Dominique Devienne  wrote:
> 
> AFAIK, that was one of the goals of SQLite4 [1], to change the backend to LSM.

LMDB (LiteTree's back-end) doesn't use LSM; it's a B-tree manager. The speedup 
appears to come from a combination of techniques like eliminating caching 
through memory-mapping, and eliminating locking through MVCC. There's a paper 
describing it in somewhat more detail[1].

I know there is memory-mapping support in SQLite, but it doesn't seem to result 
in as much speedup, and it has some nasty data-corruption bugs on macOS 
(possibly iOS too?)

LSM and LMDB would seem to have differing goals — LSM is best for applications 
with high write throughput, while LMDB is optimized more for read performance. 
I would guess that the latter is closer to the majority of SQLite use cases, 
since small/embedded systems tend to take in less data than big servers do.

—Jens

[1]: http://www.lmdb.tech/media/20120829-LinuxCon-MDB-txt.pdf
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite with branching

2019-11-04 Thread Jens Alfke

> On Nov 4, 2019, at 4:57 AM, Simon Slavin  wrote:
> 
> That's one of the reasons that the source code for SQLite is public: so that 
> people can add the features they want.

Totally agree. However, when you go off the mainline of SQLite you lose some 
things, like easy updating to new SQLite releases — you now have to deal with 
merging the new official SQLite into the forked SQLite, or waiting for the fork 
maintainer to do it.

In the case of LiteTree, I suspect the merge would be pretty difficult because 
of the extensive changes — it must be replacing the whole B-tree layer to be 
using LMDB as storage. (There was an earlier project called SQLightning that 
did the same thing. I was tempted by it, but it was based on an old version 
like 3.9 and the author made it clear he had zero interest in updating.)

I don't have a practical use for the branching features, though they're cool, 
but I'm salivating at the thought of a 2x speedup. With all the work that's put 
into eking out small performance increases in SQLite, I'd imagine the devs 
would be interested in something that made that big of a difference...

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Roadmap?

2019-10-28 Thread Jens Alfke



> On Oct 27, 2019, at 1:50 PM, Thomas Kurz  wrote:
> 
> But not compatible to standards from Open Geospatial Consortium, as far as I 
> know. 

That's the nice thing about standards: there are so many of them.

> Which requires additional conversions, and considering that geodata usually 
> have sizes of a GB or more, this is not an option at all.

Yes, you might have to buy one of those new 4GB hard disks, or upgrade your 486 
box to a Pentium… ;-)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Roadmap?

2019-10-26 Thread Jens Alfke

> On Oct 26, 2019, at 5:12 PM, Thomas Kurz  wrote:
> 
> Geospatial support would be one of the features I would *LOVE* to see in 
> SQLite :-)

SQLite has had geospatial support for years via the r-tree extension, and more 
recently GeoJSON.

As for time stamps ... I’ve been using SQLite since 2005 and have never felt 
the need to have it do more with dates than store a numeric timestamp in 
seconds-since-Unix-epoch. I have access to a lot of powerful platform APIs to 
do stuff with dates, so I don’t feel a need to have the database do similar 
things for me.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Optimizing `SELECT a, max(b) GROUP BY a`

2019-10-24 Thread Jens Alfke

I'm looking for an optimal way to query the maximum value of one (integer) 
column as grouped by another (string) column. For example, say I've got a table 
of timestamped data points from many named sources, and I want to find each 
source's latest timestamp. I'm seeing decent performance if the database is hot 
in the filesystem cache, but not when it's cold — worst case this query takes 
about 7sec to run on a 2018 MacBook Pro with a decent SSD.

My table schema includes something like `… name text, timestamp integer …`.
The table has about 500,000 rows, with 1700 distinct `name` values, and 
occupies about 500MB on disk.
I'm querying `SELECT name, max(timestamp) FROM table GROUP BY name`.
I've created an index on `name, timestamp desc`.

EXPLAIN QUERY PLAN says:
7|0|0| SCAN TABLE table USING INDEX namesAndTimestamps
Note the dreaded `SCAN`. :-(

I think I understand what's going on here: even if SQLite can tell it only 
needs to use the first index entry for any given name (because that will have 
the largest timestamp), it still has to scan through the index to find where 
the next name starts. (Or is there some optimization to jump rapidly through 
the B-tree to do this?)

Is there anything I can do to the schema, index or query to make this more 
efficient?

One simplifying assumption: I basically never drop any rows, and if I do 
they're old rows, so max(timestamp) for any author will only increase.

Thanks for any advice…

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Roadmap?

2019-10-20 Thread Jens Alfke

> On Oct 20, 2019, at 12:53 AM, Thomas Kurz  wrote:
> 
> many "playground" gadgets keep being implemented (like virtual columns, 
> virtual tables, FTS3/4/5, ...),

I suspect you are used to database servers, and haven’t used SQLite as an 
embedded library inside an app (its primary use case.) Virtual tables and 
table-valued functions are extremely useful there, as are a bunch of other 
features that probably seem like “playground” to you, like C functions and 
pointer types. The product I work on would be impossible without these.

As for FTS, I can’t see how anyone would consider it frivolous! Full text 
search is very common, something many apps with structured storage eventually 
need, and found in a lot of big database servers. (You didn’t mention 
geo-queries, but the same goes for those. And for obvious reasons they’re 
especially good to have in mobile apps.)

I’m curious which extra SQL features you think should be added? And what your 
use case is?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Jens Alfke

—Jens 

> On Oct 17, 2019, at 1:56 AM, Mitar  wrote:
> 
> So why not increase the limit to 2 billion

Why should SQLite make changes, which would introduce performance problems if 
used, just to save your particular application the trouble of concatenating 
some vectors into single columns, when it uses SQLite for an edge use-case 
that’s pretty far removed from its main purpose?

It may seem like a harmless change to you, but I’m sure it would result in 
increased support work. Someone will report extremely slow query performance, 
and after several back-and-forth emails will mention that, oh yeah, this query 
does access the five millionth column in a table...

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Network file system that support sqlite3 well

2019-10-16 Thread Jens Alfke



—Jens 

> On Oct 16, 2019, at 3:37 AM, Peng Yu  wrote:
> 
> I will need to use the actual files to test for dependency (just as
> the dependency that can be used by GNU make)

I don’t understand what that means. You want to use a makefile that checks the 
mod date of the database?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Jens Alfke

> On Oct 16, 2019, at 6:08 AM, Mitar  wrote:
> 
> Quite
> some of datasets we are dealing with have 100k or so columns.

There was a thread about this a few months ago. You Should not store every 
number of a huge vector in a separate column. You don’t need to individually 
query on every individual item, do you? Just store the vector as one column — 
JSON works well, or a blob containing a C array.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Network file system that support sqlite3 well

2019-10-15 Thread Jens Alfke

> On Oct 15, 2019, at 3:47 PM, Peng Yu  wrote:
> 
> I'd like to use sqlite3 db files on many compute nodes. But they
> should access the same storage device for the sqlite3 db files.

Why not use an actual client-server database system like MySQL? It's optimized 
for this use case, so it incurs a lot less disk (network) I/O.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Table was deleted on macOS

2019-10-15 Thread Jens Alfke

Was the database in a folder managed by Dropbox or Google Drive or iCloud 
Drive, i.e. where some background agent could replace it with a 'newer' copy 
from another computer?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Last record

2019-10-15 Thread Jens Alfke



> On Oct 15, 2019, at 9:44 AM, Simon Slavin  wrote:
> 
> You can then remember the value(s) of some column(s) of the row returned, and 
> watch for the same one(s) when you do your desired SELECT.

As long as all the result rows are unique…

All of these workaround seem more expensive/complex than just adapting your 
code so it doesn't have to know the last row in advance.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] rationale for private archives of the SQLite mailing list and suppressed reply-to

2019-10-12 Thread Jens Alfke

> On Oct 11, 2019, at 9:19 AM, Mike Bayer  wrote:
> 
> GNU Mailman is still very widely used and IMO does the job very well

Its web interface is like something from 1997. In particular, it makes reading 
archives very painful, clicking through to one message at a time.

I’d recommend groups.io — it hosts mailing lists with excellent browsable 
archives, or from another perspective, web forums with excellent email 
notifications. In my experience it pleases both mailing-list and forum fans.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] insertion of a single record taking more than 200 msec some times

2019-10-12 Thread Jens Alfke

> On Oct 11, 2019, at 7:43 AM, GopiKrishna Parisa 
>  wrote:
> 
> One weird thing is for some insertions, it's taking more than 200 msec for
> single record where as others takes around 20 to 40 (avag 27 msec).

Those sound like numbers I’d expect for committing a transaction. (The longer 
time is for a WAL flush.) But the major performance tip for SQLite is: minimize 
the number of transactions, because commits are expensive.

In your case, try to insert as many records as you can in one transaction. If 
you’re copying them from a text file, just use a single transaction. If they’re 
live, you do want to commit transactions periodically for durability, maybe 
every few seconds.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SELECT uses index with SUBSTR but UPDATE doesn't

2019-10-09 Thread Jens Alfke

> On Oct 9, 2019, at 10:02 AM, Keith Medcalf  wrote:
> 
> SUBSTR(name, 0, ?) is an expression, so unless you have an index on that 
> expression, then an index cannot be used to SEARCH for the rows.

That's accurate in general. However, there _is_ a very similar special-case 
optimization for the `like` function/operator when it's used to look for a 
prefix. (With all sorts of caveats having to do with which specific patterns 
represent prefix matches, edge conditions about data types, restrictions on 
collation and overloading of 'like'…) So I can see why it would be easy to 
think that SUBSTR has a similar optimization.

It seems like there's a need for an optimizable string-prefix predicate, i.e. 
`str BEGINS WITH prefix` or `begins_with(str, prefix)`, to allow the developer 
to be more explicit about what they mean. Prefix matching is extremely common 
in some use cases. (One of our customers raised an issue about this a few days 
ago…)

BETWEEN doesn't work well because it's inclusive, i.e. `BETWEEN 'foo' and 
'fop'` doesn't work because it matches 'fop'. Coming up with the upper end of a 
string prefix match is super annoying — `BETWEEN 'foo' and 'foo\xff' only works 
until some wise guy adds the key `foo\xff` to the table, and is invalid UTF-8 
anyway.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Date time input

2019-10-08 Thread Jens Alfke

> On Oct 8, 2019, at 5:34 AM, Jose Isaias Cabrera  wrote:
> 
> No, that is not what I was trying to say or ask.  Not even close. What I was 
> trying to say, and most of you missed it was, that if I give date a date 
> format, and I also provide the format of how that date is to be understood, 
> ie

Sorry for misunderstanding. But I think this goes against SQLite’s design goal 
of simplicity. Date-time formatting is complicated, so this would add 
measurably to the library’s footprint*, which would be a problem for using it 
in embedded systems, which would require yet another compile-time configuration 
flag to enable/disable it, which would complicate testing... etc.

I think the idea of a semi-official ”SQLite++” has been floated here before: a 
distro with lots of extensions and a more powerful CLI. I like that idea.

—Jens

* Yes, strptime/strftime are in the standard library. But in an embedded system 
that library is statically linked into your binary (there is no OS), so if 
those functions are not dead-stripped, your footprint goes up.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Date time input

2019-10-07 Thread Jens Alfke

> On Oct 7, 2019, at 6:17 AM, Jose Isaias Cabrera  wrote:
> 
> I have to ask this question:  Why is it that the date function does not take 
> a '4/5/2019' and returns '2019-04-05'? 

Because that's locale-dependent. Some countries, like the US, use 
month/day/year; most other countries use day/month/year. To interpret such a 
date string, SQLite has to know what country's customs to use. And that is a 
pretty significant problem, since:

- Different operating systems communicate locale info in completely different 
ways
- The locale settings may not be applied at the layer of the OS where SQLite is 
running (example: Android only very recently started setting the C-level locale 
to match the GUI locale.)
- The current locale may not match the locale from which the date string 
originates

A database should not have to care about locales. It's supposed to process data 
in globally-consistent ways that don't depend on settings like that.

I swear, half the questions on this list build down to "Why doesn't 
SQLite act like MS Access?" If you need all the bells and whistles of 
formatting input and output, then use a fancy DBMS application. SQLite is for 
embedded use _inside_ applications.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Calling a java function from a trigger

2019-09-30 Thread Jens Alfke

> On Sep 29, 2019, at 5:05 PM, Faria  wrote:
> 
> Is it possible to call a java function from a trigger in SQLite?

Yes, but you'd have to implement a C function that calls the Java function via 
JNI, then register the C function with SQLite, then call that function in your 
trigger.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] LIKE optimization when the LHS is an indexed expression?

2019-09-27 Thread Jens Alfke

I've been reading about the LIKE optimization[1]. One of the constraints on its 
use is:

> if case_sensitive_like mode is enabled then the column must indexed using 
> BINARY collating sequence, or if case_sensitive_like mode is disabled then 
> the column must indexed using built-in NOCASE collating sequence.

Does this also apply when the LHS is not a column but an expression? I.e. does 
the index on that expression need to have BINARY (or NOCASE) collation?

—Jens

[1]: https://sqlite.org/optoverview.html#the_like_optimization
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Safe to use SQLite over a sketchy network?

2019-09-25 Thread Jens Alfke


> On Sep 24, 2019, at 3:48 PM, Keith Medcalf  wrote:
> 
> There are not, to my knowledge, any client/server database systems that will 
> work properly if the database resides on a network filesystem (meaning remote 
> multi-access).  The "client" is remote from the "server" because the "client" 
> and "server" use some sort of IPC mechanism (of which a network is an 
> example) so that the "client" can send commands to and receive responses from 
> the "server".

Well, obviously. “Client/server” means databases like MySQL or Oracle. No one 
would run those with the server using a networked file system.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Safe to use SQLite over a sketchy network?

2019-09-24 Thread Jens Alfke

> On Sep 24, 2019, at 3:22 PM, Jose Isaias Cabrera  wrote:
> 
> Even on a great network, you can have problems, so when you say "sketchy", 
> then definitely there will be problems.

And even with a perfect network and perfect networked filesystem, it's still 
possible to get denial-of-service behavior where one client begins a 
transaction, takes out a lock on the file, and then for one reason or another 
never ends the transaction. Maybe the client hangs, or drops into a debugger, 
or is waiting for user input in the middle of a transaction, or the host it's 
on loses its network connection and it takes a long time for the server to time 
out its connection and clean up after it.

Either way, you get a lengthy period where no other client can write to the 
database — it'll lock up, or fail with SQLITE_BUSY errors, or whatever.

There will be times when a program using SQLite finds itself running over a 
networked filesystem, but no one should deliberately write SQLite-based code 
intending to use a networked filesystem. For that you want a client/server 
database.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Safe to use SQLite over a sketchy network?

2019-09-24 Thread Jens Alfke



> On Sep 24, 2019, at 2:14 PM, Randall Smith  wrote:
> 
> My question is: Do the commit-or-rollback semantics of SQLite transactions 
> work if the connection to the file system is weird?

What filesystem is it — SMB, NFS, AFP, …?

I'm not an expert, but in general I think the database file should be OK if the 
connection is dropped, as long as the filesystem has some kind of 'fsync' 
equivalent. The bigger problem is file locking:

> file locking logic is buggy in many network filesystem implementations (on 
> both Unix and Windows). If file locking does not work correctly, two or more 
> clients might try to modify the same part of the same database at the same 
> time, resulting in corruption. Because this problem results from bugs in the 
> underlying filesystem implementation, there is nothing SQLite can do to 
> prevent it.
> 
> A good rule of thumb is to avoid using SQLite in situations where the same 
> database will be accessed directly (without an intervening application 
> server) and simultaneously from many computers over a network.
(from https://www.sqlite.org/whentouse.html 
 )

From what I've heard, this is especially problematic with NFS.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite - macOS

2019-09-23 Thread Jens Alfke

> On Sep 23, 2019, at 9:53 AM, Pierre Clouthier 
>  wrote:
> 
>   sqlite3_exec("PRAGMA encoding = \"UTF-8\";") 

That isn't necessary. SQLite defaults to UTF-8.

In most cases SQLite doesn't interpret the byte sequences in a string. It just 
knows it's using an 8-bit character set and leaves it at that. So if the text 
is messed up in the database, it's because it was entered in messed-up form.

(Some components do work with non-ASCII text, like FTS, depending on how you've 
built/configured SQLite, and those do interpret UTF-8 sequences.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to install REGEXP support?

2019-09-21 Thread Jens Alfke

On Sep 19, 2019, at 7:21 PM, Peng Yu  wrote:
> 
> My question is `But it is not clear how to install it for sqlite3 installed by
> homebrew.`

That sounds more like a question to ask the Homebrew community; it isn’t really 
related to SQLite directly. SQLite is just making a system call to load a 
library by name.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to install REGEXP support?

2019-09-17 Thread Jens Alfke

> On Sep 17, 2019, at 8:13 AM, Peng Yu  wrote:
> 
> I can compile it. Where do I put the .dylib file?

According to the documentation Warren linked to, you use a SQL statement or 
shell dot-command to load the extension, giving the path to the library.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to install REGEXP support?

2019-09-16 Thread Jens Alfke



> On Sep 16, 2019, at 1:44 PM, Peng Yu  wrote:
> 
> But it is not clear how to install it for sqlite3 installed by
> homebrew. Does anybody how to install it? Thanks.

SQLite doesn't come with a regexp implementation; it has to be added by an 
extension.
I searched the website; there's a simple implementation here:

https://www.sqlite.org/src/raw/ext/misc/regexp.c?name=a68d25c659bd2d893cd1215667bbf75ecb9dc7d4
 

And regexp is also available as part of the ICU extension:
https://www.sqlite.org/src/dir?ci=ee83d8e30d6f9ca8=ext/icu 


You're going to have to build these yourself, I think.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Jens Alfke



> On Sep 13, 2019, at 10:57 AM, Hick Gunter  wrote:
> 
> This is faster if the number of keys in the list is small relative to the 
> number of records in the table.
> If the number of keys is similar to the number of records in the table, then 
> a simple full table scan may be faster.

Experimentally, the optimizer seems to choose an index search even with the 
simpler query. I ran this on a test database with about 30k rows.

> explain query plan select * from kv_default where key in ('a','b','c')

3|0|0|SEARCH TABLE kv_default USING INDEX sqlite_autoindex_kv_default_1 (key=?)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Jens Alfke

> On Sep 13, 2019, at 1:30 PM, Keith Medcalf  wrote:
> 
>  The only thing that is clear is that where the overhead of executing each 
> select is significant it is clearly better to execute fewer of them.

Thanks for the research, Keith!

In my case the per-query overhead is lower since I'm using the C interface, but 
I'm betting that it's still nontrivial, so I've decided to go with the "dynamic 
in" approach. From your measurements it doesn't look like further optimizations 
like carray are worth it.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Documentation error in comment in carray.c

2019-09-13 Thread Jens Alfke

There's a mistake in the documentation block-comment at the start of carray.c:

**sqlite3_bind_value(pStmt, i, aX, "carray", 0);

The function should be sqlite3_bind_pointer.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Jens Alfke

If I have a set of primary keys (let's say a few hundred) and need to fetch 
data from the table rows with those keys, what's the fastest way to do so? The 
options seem to be:

(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
key strings.

If I do (a), I can pre-prepare the statement and save the overhead of 
compilation. But SQLite has to go through the rest of its work (starting the 
virtual machine, b-tree lookup, etc.) once for each key.

If I do (b), SQLite has less setup work to do, and it could potentially 
optimize the b-tree lookup. On the downside, I have to prepare a statement 
every time since the RHS of an "IN" isn't substitutable.

Does anyone have intuition or actual knowledge about which approach is better? 
Or know of a 3rd better approach?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-12 Thread Jens Alfke

> On Sep 12, 2019, at 12:33 PM, mailing lists  wrote:
> 
> the application also runs on mobile devices and the insertion (excluding 
> indexing) also takes a couple of minutes. This means that the user might 
> switch to another app during insertion and - in the worst case - the system 
> is just terminating my background app. So, all the insertion is lost in case 
> of a single transaction.

[Off-topic]

Your app should not be getting terminated without notice! Use the platform's 
APIs for requesting background time. I'm not an Android programmer, but here's 
what to do on iOS:

When notified the app is being backgrounded, call 
UIApplication.beginBackgroundTask. The OS will give you more time to run. More 
importantly, you'll be notified when the OS wants you to quit, and can clean up 
(i.e. commit the transaction).

To guard against conditions where the OS needs to terminate the app while it's 
in the foreground (low memory or low battery), implement the 
UIApplicationDelegate method applicationWillTerminate and perform the same 
cleanup there.

Note that you won't get notifications like this if your DB processing is 
blocking the main thread's runloop. To prevent that, do the work on a 
background thread, or at least put the runloop in a modal state and 
periodically run it for an iteration.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLITE and the memory

2019-09-09 Thread Jens Alfke

> On Sep 8, 2019, at 2:12 PM, Philippe RIO <51...@protonmail.ch> wrote:
> 
> I use the windows task manager to see how the memory is used

I don't use Windows, but I know that in any modern OS, memory usage is a very 
vague thing and is tricky to measure. There are quite a few numbers that mean 
different things, like
- actual RAM in use by the process
- virtual address space allocated
- address space with backing store assigned to it
- address space not being shared with other processes
- address space that's writeable
- address space used for 'malloc' heaps
- address space actually in use in heaps
etc.

I find that when looking at memory usage of a program I'm working on, the stats 
related to heap space are the most useful because they correspond with memory 
my code is involved in allocating and managing.  The farther up that list you 
go, the more you see the effects of things like memory-mapped I/O, shared 
library sizes, filesystem caches, and other things that are usually out of your 
control.

Specific to SQLite: it's usually pretty easy to manage the amount of memory it 
uses because most of it is block caches, which you can customize the size of 
yourself with pragmas.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Why is a multiple MATCH clause not allowed in an FTS query?

2019-09-06 Thread Jens Alfke

> On Aug 30, 2019, at 12:43 AM, mailing lists  wrote:
> 
> SELECT * FROM myData WHERE (content MATCH 'one') AND (body MATCH 'two');
> 
> What is the reason that the above query is not allowed 

I'm curious too. This limitation is documented somewhere, but not the reason 
why.
For programs that generate queries automatically or by translating a different 
input form, it's annoying to work around.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite: see: encryption

2019-09-04 Thread Jens Alfke

> On Sep 2, 2019, at 11:41 PM, Vadiraj Villivalam 
>  wrote:
> 
> With the open os like android providing keystore and key generation
> mechanism, we want to switch to this secure key generation mechanism and
> avoid generating  key ourselves. As the key store does not allow the key
> itself to be exported out, I would like to know if sqlite has a mechanism
> to leverage the key store way of en/decrypting it

I'm not familiar with Android's keystore, but I know the iOS/macOS Keychain 
quite well.

If the keystore _only_ supports keys that it creates & manages internally and 
won't export, then you can:
1. Create your own key for SQLite, e.g. by generating 32 securely-random bytes.
2. Ask the keystore to create a symmetric key.
3. Ask the keystore to use its key to encrypt the SQLite key [from step 1].
4. Store the resulting encrypted data somewhere, e.g. in a file.

To recover the SQLite key on the next launch:
1. Read the encrypted data [from step 4 above]
2. Ask the keystore to decrypt it using its managed key.
3. Use the resulting key to open the SQLite database.

But check the docs to see if there's a keystore API that lets you simply store 
a key you've generated yourself. If not, it probably has an API for storing 
passwords; you can then just base64-encode the key and store that as though it 
were a password. That's simpler than going through the above steps.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database like file archive

2019-08-27 Thread Jens Alfke

> On Aug 27, 2019, at 12:21 PM, Keith Medcalf  wrote:
> 
> Everything that has been touched by a third-party is inherently 
> untrustworthy.  Thus it is and thus it has always been.  

Yes. I have a lot of experience with network coding and security, so I'm aware 
of this, thanks. My question was simply whether SQLite itself is considered 
safe when operating on an untrusted database file.

> Even ZIP files have a database schema that can be manipulated as does 
> everything else.

The layout of a Zip file is vastly simpler than a SQLite database.
A Zip codec does not include an interpreter for a sophisticated programming 
language.
Zip files do not contain program code that runs when the file is read; SQLite 
databases can.

> And how is this in anyway different from a zip process, or a rar processess 
> or an uncompress process or any or a number of possibly trustworthy programs 
> processing data coming from an untrustworthy source?  (which includes things 
> like Web Browsers, Video Players, and on and on)

Codecs used in such apps are considered attack surfaces and are screened for 
vulnerabilities. (For example, Google found and fixed some security holes in 
the TrueType font renderer when they added web-font support to Chrome.) This is 
precisely what I'm asking about SQLite — is it engineered with the assumption 
that a database file may be malicious, or is the assumption "garbage in, 
garbage out"?

> Chrome is a Google product.  Google's only revenue source is selling 
> information that they have obtained from third-
[anti-Google ranting removed]

This is not only off-topic and inaccurate (Google has many other revenue 
sources), it's the sort of scenery-chewing conspiracy theorizing that's beneath 
someone with your level of expertise. Check yo'self.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] database like file archive

2019-08-27 Thread Jens Alfke

> On Aug 27, 2019, at 7:06 AM, Philip Bennefall  wrote:
> 
> There is the sqlar archive format, which you can test using the official 
> sqlite3 command line shell. There is also a library for it as part of the 
> Sqlite3 repository.

"An SQLite Archive is an ordinary SQLite database file that contains the 
following table as part of its schema…"

Archive files often get transferred between people. Using this format for that 
purpose would involve opening and reading untrusted SQLite database files. Is 
that safe? Could maliciously corrupting the schema or other metadata of a 
database cause security problems for the client accessing the database?

(I'm thinking not just of a separate `sqlite3` process accessing the archive, 
but also of the archiving code running inside some other process — consider a 
web browser or file manager extracting a sqlar archive.)

There were some security issues that came up recently involving the Chrome 
browser allowing untrusted JS code to run SQLite queries on local database 
files. But the scenario I'm thinking of is kind of the reverse — the queries 
are trusted but the database itself isn't.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

1 2 3 4 5 6 >

1 - 100 of 538 matches

Mail list logo