from:"Eric Grange"

Re: [sqlite] Please increase the default for SQLITE_MAX_VARIABLE_NUMBER

2020-02-12 Thread Eric Grange

If the trouble comes from a big "IN()", an approach can be to pass all the
values in a JSON array (one parameter) and use json_each in the query.
This is completely safe vs SQL injection, and IME quite efficient.

IME using JSON + json_each is also very efficient to fill temporary tables
(indexed if appropriate), in case the filter is reused in multiple queries.

Le mar. 11 févr. 2020 à 20:39, J. King  a écrit :

> On February 11, 2020 1:43:30 p.m. EST, Jens Alfke 
> wrote:
> >I ran into this a few months ago. I ended up just biting the bullet and
> >constructing a SQL statement by hand, concatenating comma-separated
> >values inside an "IN (…)" expression.
> >
> >Yes, SQL injection is a danger. But if you're being bad in just one
> >place, and you review that code, you can do this safely. SQLite's C API
> >even has a function that escapes strings for you, and if you're not
> >coding in C/C++, it's easy to write your own; basically
> >   str ⟶ "'" + str.replace("'", "''") + "'"
>
>
> Same here, for what it's worth. Since SQLite also has a 1M byte statement
> length limit I had my application embed terms once an IN() expression
> exceeded a certain number of terms, but used parameters always for string
> terms longer than a couple hundred bytes.
> --
> J. King
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] New word to replace "serverless"

2020-01-27 Thread Eric Grange

Maybe "edge" database ? Or "local" database ? Both are trending terms, on
the theme of taking control and performance back from the cloud.

"Embedded" would be technically good, but is often associated with devices
and small things these days.


Le mar. 28 janv. 2020 à 05:58, Rowan Worth  a écrit :

> On Tue, 28 Jan 2020 at 06:19, Richard Hipp  wrote:
>
> > Note that "in-process" and "embedded" are not adequate substitutes for
> > "serverless".  An RDBMS might be in-process or embedded but still be
> > running a server in a separate thread. In fact, that is how most
> > embedded RDBMSes other than SQLite work, if I am not much mistaken.
> >
>
> I think embedded does capture SQLite well though. For a lot of devs the
> target API is the important thing, and whether there are threads behind the
> scenes is something of an implementation detail. But it is certainly a nice
> feature of SQLite's implementation, perhaps "embedded, threadless" would
> work to clarify that (although it's not an objectively true description
> once WORKER_THREADS enter the equation).
>
> "in-thread" also has a certain appeal - it's not a term I've seen used
> before but it makes sense as a stronger version of "in-process."
>
> I can't find any general terms for a library which spawns threads vs. one
> which doesn't.
> -Rowan
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Eric Grange

> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
performance increase.
> Changing the threading mode or the indirection level of the mutexes calls
seems to have no significant effect.

That is quite significant.
Looking at the code, it seems the mutex requirement is mostly for the soft
heap limit, maybe there could be a way to use atomic instruction to
maintain the stats ?
The highwater stat could be handled by a CAS (loop), and this should make
the memstatus data cheaper overall when soft heap limit is not required.

(my use case involved a hundreds of independent SQLite databases from a
single process, I have not benchmarked yet, but I am probably hitting that
mutex hard as well)

Eric




Le ven. 3 janv. 2020 à 17:36, Keith Medcalf  a écrit :

> On Friday, 3 January, 2020 09:30, sky5w...@gmail.com wrote:
>
> >I get SQLITE_MISUSE when attempting
> >sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >immediately after opening a db connection?
> >My connection has THREADSAFE = 1.
>
> That is correct.  You must configure the library before it is initialized,
> not after.
>
> https://sqlite.org/c3ref/config.html
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Performance vs. memory trade-off question

2019-12-14 Thread Eric Grange

While I am using SQLIite mostly in x86-64 with plentiful RAM, I suspect the
reduced lookaside size will help CPU L1/L2 caches, which is definitely
never plentiful. I would not be surprised if it resulted in a speedup in
practical high-load multi-threaded scenarios.

Also future SQLite code changes may be able to further build upon a new
lookaside configuration by tweaking the structure sizes to more tightly
match with the slot sizes f.i., and reduce the performance hit even in
single-threaded cases.

So "on by default" for me.

Eric

Le sam. 14 déc. 2019 à 14:27, Richard Hipp  a écrit :

> A new feature on a branch has the following disadvantages:
>
> (1)  It uses about 0.25% more CPU cycles.  (Each release of SQLite is
> normally about 0.5% faster, so enabling this feature is sort of like
> going back by one-half of a release cycle.)
>
> (2)  The code space (the size of the library) is between 400 and 500
> bytes larger (depending on compiler and optimization settings).
>
> The this one advantage:
>
> (3)  Each database connection uses about 72 KB less heap space.
>
> QUESTION:  Should this feature be default-on or default-off?
>
> What's more important to you?  0.25% fewer CPU cycles or about 72KB
> less heap space used per database connection?
>
> The feature can be activated or deactivated at start-time, but you
> take the disadvantages (the performance hit and slightly larger
> library size) regardless, unless you disable the feature at
> compile-time.  If the feature is compile-time disabled, then the
> corresponding code is omitted and and it cannot be turned on at
> start-time.
>
> If you have opinions, you can reply to this mailing list, or directly to
> me.
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Eric Grange

> I wrote earlier that for us use case where we are reading whole rows is
the most common one.
> [...]
> we are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too much.

If you do not need access to individual columns too much, but want to still
be able to operate on them when needed,
my suggestion would be to store them as JSON in a blob, and use the JSON
functions of SQLite to extract the data
when needed (https://www.sqlite.org/json1.html)

This would mean you have both the stability of a text format for long-term
storage (JSON) while still having
the ability to run queries, index and manipulate data with SQL.

I am using such an approach successfully, and when some fields in the JSON
are flagged for indexing or direct access
(f.i. for summaries / descriptions), I just extract them to a relational
table which can then be indexed and used directly
as usual (you can use triggers to automate that extraction).
You may also be able to use indexes on expression if you only want indexing
(https://www.sqlite.org/expridx.html)

Eric

Le jeu. 17 oct. 2019 à 14:50, Mitar  a écrit :

> Hi!
>
> This is getting a bit off topic.
>
> On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin 
> wrote:
> > 1) Almost no piece of software can handle a grid 2 billion cells wide.
> Excel maxes out at 16,384 columns.  Matlab can store and retrieve a cell of
> data directly from a file, but it has a max array size of 1.  R maxes
> out at 2147483647, which is more than 2 billion.  But R has to hold all the
> data from a matrix in memory at once and it can't assign enough memory to
> one object to hold that many cells.
>
> Of course, 2 billion is a lot. But 100k is something many ML libraries
> support. Pandas, ndarray, R. Not something to magical about that.
>
> > 2) Object names are not data.  They're descriptions in your favourite
> human language.  They're not meant to have weird sequences of characters in.
>
> Not sure what this relates to.
>
> > 3) Lots of CSV import filters ignore a column header row, or can only
> create fieldnames with certain limits (max length, no punctuation
> characters, etc.).  So you should expect to lose fieldnames if you try to
> import your data into some new piece of software.
>
> Does SQLite have limitations on what can be a column name? If not,
> then I would not worry what some CSV importers do. We would use a good
> one to convert to SQLLite.
>
> > (4) SQLite stores all the data for a row is together, in a sequence.  If
> you ask for the data in the 3756th column of a row, SQLite has to read and
> parse the data for the first 3755 columns of that row, just to read a
> single value from storage.  As you can imagine, this is slow and involves a
> lot of I/O.  And while it happens the row up to that point must all be held
> in memory.  Consequently, nobody who uses SQLite for its intended purpose
> actually does this.  I dread to think how slow random access over 2 billion
> columns would be in SQLite.
>
> I wrote earlier that for us use case where we are reading whole rows
> is the most common one.
>
> > Your gene expressions are data.  They are not the names of table
> entities.  They should be stored in a table as other posts suggested.
>
> Maybe. But often this data is represented as a row of expressions with
> columns for each gene. Because this is what is being distributed, we
> are looking for ways to store this in a stable format which will be
> supported for next 50 years, without modifying to original data too
> much. I do hear suggestions to do such transformation, but that is
> less ideal for our use case.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Location of error in SQL statements ?

2019-05-07 Thread Eric Grange

Hi,

Is there are way to get more information about an SQL syntax error message ?
For example on a largish SQL query I got the following error message

near "on": syntax error

but as the query is basically a long list of joins, this is not too helpful
;)

Having a character offset, a line number or a larger code snippet would be
enough.

Best regards,
Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [EXTERNAL] json_group_array() and sorting

2019-01-08 Thread Eric Grange

Shawn's json_group_array(json(o)) works indeed, but it's also 30% slower in
my case than using

'[' || ifnull(group_concat(o, ','), '') || ']'


which is however more case specific and less obvious.

Would be nice to see the subtype passing be improved, as otherwise query
planner improvements
could end up wrecking existing queries where the subtype currently survives.

Eric


Le mar. 8 janv. 2019 à 11:41, Dominique Devienne  a
écrit :

> On Tue, Jan 8, 2019 at 11:04 AM Dominique Devienne 
> wrote:
> >
> > On Tue, Jan 8, 2019 at 10:50 AM Eric Grange  wrote:
> >>
> >> Can someone confirm whether this is a bug ?
> >
> >
> > My guess is that it works as "designed", even if this is surprising...
> >
> > I believe that JSON1 leverages value "sub-types" [1], which allow chained
> JSON
> > "documents" to be processed in an optimized fashion (to avoid
> internal-representation
> > to text, and back conversions across JSON1 calls).
> >
> > But when you add sorting to the mix, SQLite probably decides to "lose"
> the subtype
> > and convert to string for some reasons.
> >
> > Look at the plans. SQLite may "flatten" the 1st query, so that the JSON1
> functions "chain"
> > correctly, preserving the subtype, while in the 2nd query's plan, the two
> functions do not
> > "chain" anymore, "losing" the subtype.
> >
> > This is a side-effect of subtypes being a bit of "wart" and not really
> part of the type-system proper.
> > So they are easily lost along the way, in ways which depend on how the
> planner "rewrites" the
> > query, as in your case. Subtypes are still very useful, but more of a
> "pragmatic" solution, than
> > an elegant design, for once in SQLite. IMHO :).
> >
> > This is just a guess though. DRH will likely shed more light on this.
> Thanks, --DD
> >
> > [1] https://www.sqlite.org/c3ref/value_subtype.html
>
> See also [2], which states "values [...] are transient and ephemeral.
> [...]. The pointers will not survive sorting".
>
> The pointer-passing APIs are different from the subtype one, but as [3]
> states, they
> both addressed the same issue, and likely obey similar rules, linked to
> sqlite3_value in general.
>
> This is IMHO what's going on. And I think Shawn's work-around will work :).
> --DD
>
> [2]
>
> https://www.sqlite.org/bindptr.html#restrictions_on_the_use_of_pointer_values
> [3] https://www.sqlite.org/bindptr.html#preventing_forged_pointers
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [EXTERNAL] json_group_array() and sorting

2019-01-08 Thread Eric Grange

Thanks.

I think I may have encountered a "real" bug while ordering in a subquery.
I have simplified it in the following exemples:

select json_group_array(o) from (
   select json_object(
  'id', sb.id
   ) o
   from (
  select 1 id, 2 field
  ) sb
)


the json_group_array returns an array of JSON object, ie. [{"id":1}] while
in

select json_group_array(o) from (
   select json_object(
  'id', sb.id
   ) o
   from (
  select 1 id, 2 field
  ) sb
   order by sb.field desc
)

so with an added order by in the subquery, it returns and array of JSON
strings, ie. ["{\"id\":1}"]

In my particular case, I can work around the issue by using group_concat()
rather than json_group_array()

Can someone confirm whether this is a bug ?

Thanks!


Le mar. 8 janv. 2019 à 10:18, Hick Gunter  a écrit :

> I don't recall that any (aggregate) function is concerned at all about the
> order in which rows are visited. The effect is only visible in
> non-commutative aggregates (e.g. concatenation).
>
> If you want the arguments presented to an aggregate function in a specific
> order, then you need to enforce that order, with an order by clause in a
> subselect if necessary.
>
> If you have an order by clause which is already fulfilled by the
> visitation order, SQLite will not sort again.
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> Im Auftrag von Eric Grange
> Gesendet: Dienstag, 08. Jänner 2019 09:17
> An: General Discussion of SQLite Database <
> sqlite-users@mailinglists.sqlite.org>
> Betreff: [EXTERNAL] [sqlite] json_group_array() and sorting
>
> Hi,
>
> Is json_group_array() supposed to honor a sorting clause or not ? (and
> concatenation aggregates in general)
>
> I have a query like
>
> select json_group_array(json_object(
>'id', st.id,
>'num', st.numeric_field,
>...bunch of fields here...
> ))
> from some_table st
> ...bunch of joins here...
> where ...bunch of conditions...
> order by st.numeric_field desc
> limit 50
>
>
> but the resulting JSON array is not ordered according to the "order by",
> but AFAICT by the st.id field (a primary key) When not aggregating, the
> records are in the correct order.
>
> Is it a bug or something expected ?
>
> I can get the proper order when I use a subquery for the joins & filters,
> and aggregate in a top level query, but that is rather more verbose, and I
> am not sure the ordering being preserved in that case is not just
> 'circumstancial' and could be affected by future SQLite query optimizations.
>
> Thanks!
>
> Eric
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick | Software Engineer | Scientific Games International GmbH |
> Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O)
> +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] json_group_array() and sorting

2019-01-08 Thread Eric Grange

Hi,

Is json_group_array() supposed to honor a sorting clause or not ? (and
concatenation aggregates in general)

I have a query like

select json_group_array(json_object(
   'id', st.id,
   'num', st.numeric_field,
   ...bunch of fields here...
))
from some_table st
...bunch of joins here...
where ...bunch of conditions...
order by st.numeric_field desc
limit 50


but the resulting JSON array is not ordered according to the "order by",
but AFAICT by the st.id field (a primary key)
When not aggregating, the records are in the correct order.

Is it a bug or something expected ?

I can get the proper order when I use a subquery for the joins & filters,
and aggregate in a top level query, but that is
rather more verbose, and I am not sure the ordering being preserved in that
case is not just 'circumstancial' and could
be affected by future SQLite query optimizations.

Thanks!

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] upsert from select

2018-11-30 Thread Eric Grange

Thanks!

Apparently adding just a "WHERE 1" clause is enough, ie. this passes

INSERT INTO vocabulary(word, count)
SELECT * FROM (SELECT  'jovial', 1) WHERE 1
ON CONFLICT(word) DO UPDATE SET count=count+1


and the "WHERE 1" also makes the query with a json_each pass (not just in
the snippet I posted, but also
in the more complex I am actually using)

> PS: I used "wcount" rather because "count" is an internal SQL function.

Indeed, though it seems to be accepted here, I am not using a field with my
name in may actual code.
I only used it because that was in the example I copy-pasted from the SQL
doc, I guess the doc could
be updated (it's in https://sqlite.org/lang_UPSERT.html)




Le ven. 30 nov. 2018 à 11:05, R Smith  a écrit :

> This does seem like a small bug.
>
> While the SQLite devs are having a look, this Zero-cost work-around
> might suit your needs:
> Simply add a WHERE clause, for example:
>
> CREATE TABLE vocabulary (
>word TEXT NOT NULL PRIMARY KEY,
>wcount INT DEFAULT 1
> );
>
> WITH A(w) AS (
>SELECT 'jovial' UNION ALL
>SELECT 'jovial'
> )
> INSERT INTO vocabulary(word)
> SELECT w FROM A WHERE 1
> ON CONFLICT(word) DO UPDATE SET wcount=wcount+1
> ;
>
>
> SELECT * FROM vocabulary
>
>-- word   |wcount
>-- -- | 
>-- jovial |   2
>
>
> PS: I used "wcount" rather because "count" is an internal SQL function.
>
>
> On 2018/11/30 11:14 AM, Eric Grange wrote:
> > Hi,
> >
> > I am running into a little trouble when trying to use and "upsert" from a
> > select clause.
> >
> > Starting from the "vocabulary" exemple at
> > https://sqlite.org/lang_UPSERT.html this works
> >
> > INSERT INTO vocabulary(word, count)
> > SELECT 'jovial', 1
> > ON CONFLICT(word) DO UPDATE SET count=count+1
> >
> >
> > but as soon as the "SELECT" has a from clause it does not seem to be
> > working (near "DO": syntax error)
> >
> > INSERT INTO vocabulary(word, count)
> > SELECT * FROM (SELECT  'jovial', 1)
> > ON CONFLICT(word) DO UPDATE SET count=count+1
> >
> >
> > (without the ON CONFLICT clause the above is accepted)
> >
> > I have tried to place the SELECT between parenthesis, but SQLite then
> > complains of an error on the opening parenthesis.
> > Any workarounds ?
> >
> > My actual usage case would actually use a json_each to provide the values
> > (in case that throws an extra spanner...), like in
> >
> > INSERT INTO vocabulary (word, count)
> > SELECT atom, 1 from json_each('["alpha","beta"]')
> > ON CONFLICT(word) DO UPDATE SET count=count+1
> >
> >
> > Eric
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] upsert from select

2018-11-30 Thread Eric Grange

Hi,

I am running into a little trouble when trying to use and "upsert" from a
select clause.

Starting from the "vocabulary" exemple at
https://sqlite.org/lang_UPSERT.html this works

INSERT INTO vocabulary(word, count)
SELECT 'jovial', 1
ON CONFLICT(word) DO UPDATE SET count=count+1


but as soon as the "SELECT" has a from clause it does not seem to be
working (near "DO": syntax error)

INSERT INTO vocabulary(word, count)
SELECT * FROM (SELECT  'jovial', 1)
ON CONFLICT(word) DO UPDATE SET count=count+1


(without the ON CONFLICT clause the above is accepted)

I have tried to place the SELECT between parenthesis, but SQLite then
complains of an error on the opening parenthesis.
Any workarounds ?

My actual usage case would actually use a json_each to provide the values
(in case that throws an extra spanner...), like in

INSERT INTO vocabulary (word, count)
SELECT atom, 1 from json_each('["alpha","beta"]')
ON CONFLICT(word) DO UPDATE SET count=count+1


Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] SQLite pre-compiled DLL for Windows x64

2018-09-27 Thread Eric Grange

Hi,

For version 3.25.2, the precompiled x64 DLL appears missing, there are
several vsix downloads, but when extracting the bundled DLLs
(in the Redist/Retail directories), they appear different (or non-standard
ones that need some form of post-processing ?),
for instance the x86 dll is larger (1222656 bytes vs 910716 bytes for the
one in the zip), and the x64 dll fails when loaded with LoadLibrary().

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Reducing index size

2018-07-31 Thread Eric Grange

 > Maybe you're writing for a fixed-space embedded device, which nonetheless
> has space for the gigabytes required

Actually I am at the other end of the spectrum and running out of SSD space
on the server-side,
with the higher SSD storage tiers being quite more expensive.
I am also leaving the realm of "baseline" bandwidth requirements for timely
backup & restore.

Cutting down on redundancy could give a couple years of breathing room,
hopefully enough
for terabyte SSDs and 10 Gbps network to become more widespread ;)

> I right that in addition to modifying your queries to extract the
> substrings, you'd have the work of handling substring collisions?

Yes, that means replacing a single "=" test in SQL with two (one for
filtering, one for ensuring the match).

Cutting down to 64 bits is enough in practice to make substring collisions
very unlikely, while dividing
the index size by 4 to 8.
I ran some tests and even 32 bits substrings have few enough collisions to
not be a problem performance-wise
(the lookup is just the first step of typically far more complex queries,
so even if it happens 10 time slower,
it does not matter much).


On Mon, Jul 30, 2018 at 7:23 PM, Donald Griggs  wrote:

> There's a good chance this comment won't be useful to you, Eric.
> Nevertheless,
>
> Any chance of relaxing your space requirement?   I.e., what bad things
> happen if the space is not reduced?
>
> Maybe you're writing for a fixed-space embedded device, which nonetheless
> has space for the gigabytes required, and you can't expand without a chip
> change for all your customers?If not such a case, would adding, say,
> $3-worth of additional disk space perhaps solve things?
>
> Also -- If you implement your index on the first several bytes of your
> cryptographic hash, then since you have millions of them per gigabyte,  am
> I right that in addition to modifying your queries to extract the
> substrings, you'd have the work of handling substring collisions?
>
> One of the main reasons for using a database is to automate the creation
> and use of efficient search algorithms with the tradeoff of more
> expenditure on disk space.
>
> I hope you find an acceptable solution,
>Donald
>
>
>
>
>
> On Mon, Jul 30, 2018 at 4:32 AM Eric Grange  wrote:
>
> > Hi,
> >
> > Is there a way to reduce the size of an index on strings/blobs ?
> >
> > I have tables which are a key + value, the key is an "integer primary key
> > autoincrement", which is used for references in all other tables of the
> > schema.
> > The values are cryptographic GUIDs (so 256 to 512 bits in size) with a
> > "compact" encoding (either base64 or blob rather than hexadecimal
> strings),
> > but they still represent gigabytes of data.
> >
> > Those tables have an index on the value, and my problem is that the size
> of
> > this index (as reported by dbstat or sql3_analyzer) is about the same
> > as the table.
> >
> > As these are cryptographic GUIDs, the first few bytes of a values are in
> > practice unique, so in theory I can index just the first few bytes (using
> > substr()),
> > this indeed reduces in a much smaller index, but this also requires
> > adapting all queries that search by value.
> >
> > Before starting down that route, is there another way?
> >
> > My searches on those indexes are by either exact value or by value start
> > (human search & auto-completion)
> >
> > Eric
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Reducing index size

2018-07-30 Thread Eric Grange

@Rowan Worth
> Doesn't that problem already exist with the current index? Except worse
> because it's storing the cryptographic hash *and* the rowid.

No, because SQLite is using a B-Tree (and with cryptographic hashes, it
should even take less effort to balance)



On Mon, Jul 30, 2018 at 12:05 PM, Rowan Worth  wrote:

> On 30 July 2018 at 17:53, Eric Grange  wrote:
>
> > @Rowan Worth
> > > What if you could create a "lite" index, which stores just the rowids
> in
> > a particular order and
> > > refers back to the table for the rest of the column data?
> >
> > As I have millions of rows, and data could get inserted anywhere in that
> > index (given the values are
> > essentially random), maintaining such an index would not be light work.
> >
>
> Doesn't that problem already exist with the current index? Except worse
> because it's storing the cryptographic hash *and* the rowid.
>
> I wasn't suggesting that you manage such an index yourself fwiw, but as a
> potential future feature for sqlite - a simple mechanism to control space
> used by an index.
>
> -Rowan
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Reducing index size

2018-07-30 Thread Eric Grange

@Dominique Devienne
> SQLite supports function-based indexes, but unfortunately if does not support
"function-based columns".

Far fetched maybe, but could a virtual table or table-valued functions be
used to provide that?

ie. use the virtual table to pass data directly to an index, and then
expose the data stored in the index as a column.
(I am not accomplished enough about sqlite indexes to know how far fetched
the above would be)

@Rowan Worth
> What if you could create a "lite" index, which stores just the rowids in
a particular order and
> refers back to the table for the rest of the column data?

As I have millions of rows, and data could get inserted anywhere in that
index (given the values are
essentially random), maintaining such an index would not be light work.

@Simon Slavin
> A later post from OP suggests that the critical point here really is
filespace, not search time,

Yes, this is because in the rest of the schema I am always using the key
(integer), so the value -> key
lookup only happens in API parameters, never in joins or other internal
processing.

The key -> value lookups need to be faster as they happen in SQL outputs,
but at worst each query
will be doing thousandths of those. So a little loss of performance there
could be acceptable there as well.

Eric

On Mon, Jul 30, 2018 at 11:40 AM, Simon Slavin  wrote:

> On 30 Jul 2018, at 10:25am, Dominique Devienne 
> wrote:
>
> > The former allows you to get what you want, but as you wrote, you must
> > rewrite your queries. The latter,
> > if supported, would allow to move the "function definition" to the
> column,
> > and index the vcolumn directly.
>
> It's the usual speed vs. filespace issue.  And as usual it comes down to
>
> how speed-critical searches are
> how often you do searches, and
> relative frequency of making changes vs. searching
>
> A later post from OP suggests that the critical point here really is
> filespace, not search time, so my previous recommendation is perhaps not
> appropriate for this situation.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Quick way to figure SQLite database block size per table & indexes

2018-07-30 Thread Eric Grange

PtrMap pages may be too much overhead in my case, I have occasionally run
vacuum
on same databases to see the effect, and it was not very significant.

This is likely because the databases are heavily skewed towards inserting
(and indexing)
data than about update/delete, and while the tables are quite fragmented,
since I am using SSDs,
the gains from a vacuum defragmentation appears marginal.

Eric



On Mon, Jul 30, 2018 at 10:33 AM, Dominique Devienne 
wrote:

> Oops, sent too early...
>
> On Mon, Jul 30, 2018 at 10:29 AM Dominique Devienne 
> wrote:
>
> > On Mon, Jul 30, 2018 at 10:11 AM Eric Grange 
> wrote:
> >
> >> @David Yip
> >> > If you have the dbstat module enabled (-DSQLITE_ENABLE_DBSTAT_VTAB),
> you
> >> > can query the dbstat virtual table for the number of pages used per
> >> table and index, e.g.
> >>
> >> Thanks, I did not know about that module, however it is a bit slow, on
> a 7
> >> GB database I have, it takes almost 5 minutes, and I have larger
> >> databases :/
> >> So it's definitely something that would have to run in the background or
> >> night.
> >>
> >
> > Well, you probably already know that in SQLite, there's a 100 bytes
> > header, then N fixed-sized pages,
> > where the page size is configurable from 512 bytes to 64KB by powers of
> 2.
> > The first page contains
> > the sqlite_master table, which references the "root" page of each table
> > and index. But that's it. To know
> > how many pages each table/index uses, SQLite (or any other tool) will
> need
> > to read each object's root page,
> > and traverse the b-tree of pages rooted at that first page (and their
> > overflow pages). Then repeat for each object.
> > Thus in most cases, all pages of the DB will need to be "paged", i.e.
> > read, so that's a lot of IO, so not quite fast.
> >
> > If your DB has auto or incremental vacuum configured, you've have special
> > PtrMap pages.
> >
>
> See https://www.sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages
>
> Perhaps thanks to those, you'd be able to figure how many pages per object
> faster,
> from just those PtrMap pages, and the root pages from sqlite_master. But
> then you'd need
> to figure this out yourself. And accept the price of SQLite
> maintaining/updating those pages
> for you during "normal" DMLs. I don't know what the overhead is exactly, on
> average. But it
> can't be faster than not maintaining those pages for sure :). FWIW.
>
>
> >
> > @D Burgess
> >> > download sqlite3_analyzer
> >>
> >> Thanks, the text output is interesting with built-in documentation.
> >
> >
> > If I recall correctly, sqlite3_analyzer's output is based on the dbstat
> > vtable, so it's unlikely to be faster I assume. --DD
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Reducing index size

2018-07-30 Thread Eric Grange

@Simon Slavin
> Don't index using substr().  That would be slow because it has to keep
working out substr().

I gave it a try, but that grows the size of the tables, and would require a
full update of the key/value table,
so not something that can be deployed without a lot of I/O.

The substr() should not be a problem, I am ready to trade disk space for
cpu time, and I was planning
to use a check() constraint to ensure uniqueness without an explicit unique
index (I can probably live
with the odd duplicates in the substr value, if the space gained is worth
it)

However, the beef of the issue is that all queries that are doing a "value
= xxx" will have to be rewritten
to something like "(substr(value, 1, 8) = substr(xxx, 1, 8) and value =
xxx)" and then check in the query plan
that the substr index is actually used.

So, if there is a "better" way... :)

@Paul Sanderson
> If I understand correctly then changing from a base64 index to a blob 
> containing
the raw bytes would save 25%

Yes it would, but the problem is for human searches / auto-completions,
some keys are displayed as base64,
so humans type in the base64 first characters.
it might be possible to do a partial base64 decoding, and then filter on
re-encoded base64, but that would be
quite complex for "just" 25% size gained :/

Indexing on the string start still allows the filtering to occur with
"substr(value, 1, 8) between x and y" f.i.

Eric


On Mon, Jul 30, 2018 at 11:16 AM, Paul Sanderson <
sandersonforens...@gmail.com> wrote:

> If I understand correctly then changing from a base64 index to a blob
> containing the raw bytes would save 25%
>
> Paul
> www.sandersonforensics.com
> SQLite Forensics Book <https://www.amazon.co.uk/dp/ASIN/1980293074>
>
> On 30 July 2018 at 09:32, Eric Grange  wrote:
>
> > Hi,
> >
> > Is there a way to reduce the size of an index on strings/blobs ?
> >
> > I have tables which are a key + value, the key is an "integer primary key
> > autoincrement", which is used for references in all other tables of the
> > schema.
> > The values are cryptographic GUIDs (so 256 to 512 bits in size) with a
> > "compact" encoding (either base64 or blob rather than hexadecimal
> strings),
> > but they still represent gigabytes of data.
> >
> > Those tables have an index on the value, and my problem is that the size
> of
> > this index (as reported by dbstat or sql3_analyzer) is about the same
> > as the table.
> >
> > As these are cryptographic GUIDs, the first few bytes of a values are in
> > practice unique, so in theory I can index just the first few bytes (using
> > substr()),
> > this indeed reduces in a much smaller index, but this also requires
> > adapting all queries that search by value.
> >
> > Before starting down that route, is there another way?
> >
> > My searches on those indexes are by either exact value or by value start
> > (human search & auto-completion)
> >
> > Eric
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Reducing index size

2018-07-30 Thread Eric Grange

Hi,

Is there a way to reduce the size of an index on strings/blobs ?

I have tables which are a key + value, the key is an "integer primary key
autoincrement", which is used for references in all other tables of the
schema.
The values are cryptographic GUIDs (so 256 to 512 bits in size) with a
"compact" encoding (either base64 or blob rather than hexadecimal strings),
but they still represent gigabytes of data.

Those tables have an index on the value, and my problem is that the size of
this index (as reported by dbstat or sql3_analyzer) is about the same
as the table.

As these are cryptographic GUIDs, the first few bytes of a values are in
practice unique, so in theory I can index just the first few bytes (using
substr()),
this indeed reduces in a much smaller index, but this also requires
adapting all queries that search by value.

Before starting down that route, is there another way?

My searches on those indexes are by either exact value or by value start
(human search & auto-completion)

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Quick way to figure SQLite database block size per table & indexes

2018-07-30 Thread Eric Grange

@David Yip
> If you have the dbstat module enabled (-DSQLITE_ENABLE_DBSTAT_VTAB), you
> can query the dbstat virtual table for the number of pages used per table and
index, e.g.

Thanks, I did not know about that module, however it is a bit slow, on a 7
GB database I have, it takes
almost 5 minutes, and I have larger databases :/
So it's definitely something that would have to run in the background or
night.

Also, I do not know if it's intentional, but the precompiled sqlite3.dll
from www.sqlite.org has the module
built-in for Win64, but not for Win32.


@D Burgess
> download sqlite3_analyzer

Thanks, the text output is interesting with built-in documentation.


On Mon, Jul 30, 2018 at 9:06 AM, D Burgess  wrote:

> download sqlite3_analyzer
> from
> http://www2.sqlite.org/2018/sqlite-tools-linux-x86-324.zip
>
>
>
> On Mon, Jul 30, 2018 at 4:46 PM, Eric Grange  wrote:
> > Hi,
> >
> > Is there a quick way (as in speed) to obtain the number of database
> blocks
> > (or kilobytes) allocated for each table and each index ?
> >
> > I have been using various manual approaches so far (from using length()
> to
> > vacuum and drops), but none of them are really fast
> > or can practical to automate.
> >
> > Eric
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> On Mon, Jul 30, 2018 at 4:46 PM, Eric Grange  wrote:
> > Hi,
> >
> > Is there a quick way (as in speed) to obtain the number of database
> blocks
> > (or kilobytes) allocated for each table and each index ?
> >
> > I have been using various manual approaches so far (from using length()
> to
> > vacuum and drops), but none of them are really fast
> > or can practical to automate.
> >
> > Eric
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> --
> D Burgess
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Quick way to figure SQLite database block size per table & indexes

2018-07-30 Thread Eric Grange

Hi,

Is there a quick way (as in speed) to obtain the number of database blocks
(or kilobytes) allocated for each table and each index ?

I have been using various manual approaches so far (from using length() to
vacuum and drops), but none of them are really fast
or can practical to automate.

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Eric Grange

Thanks Richard!

Changing the inner join to a cross join works as well in that case, though
is it enough to always disable the left join optimization ?

I have other variants of the query with different/more left joined
tables/subqueries, and varying filtering conditions, as the query
is dynamically generated from user options and filters (which can indeed
lead to SQL that is not really "optimal").

Is having a cross join somewhere among the joins enough to "disable" the
left join strength reduction for other joins?

On Tue, Jun 26, 2018 at 5:58 PM, Richard Hipp  wrote:

> On 6/26/18, Eric Grange  wrote:
> > I am experiencing a massive performance issue on a query with a left join
> > in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> > milliseconds.
> > The problematic query looks like
> >
> >   select d.key_field, count(*) nb
> >   from low_volume_table b
> >   join mid_volume_table c on c.key_b = b.id
> >   left join high_volume_table d on d.key_c = c.id
> >   where b.id >= $1
> >   and b.filter1 = 0
> >   and c.filter2 > 0
> >   and d.filter3 > 0
> >   and d.filter4 = 1
> >   group by d.key_field
> >   order by nb desc
>
> Dan bisected and found the speed reduction coincides with the
> introduction of the LEFT JOIN strength reduction optimization
> (https://sqlite.org/optoverview.html#leftjoinreduction) in version
> 3.23.0.  (Dan bisected to the specific check-in
> https://sqlite.org/src/info/dd568c27).
>
> Your work-around is to change the LEFT JOIN into CROSS JOIN, thus
> forcing the query planner to preserve the same join order as it did
> before the string reduction optimization.
>
> We (the SQLite devs) will take an action to try to improve the query
> planner so that it picks a better plan for your case.
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Eric Grange

Also ran a few index to "force" the query plan, but with limited success:

- the "indexed by" clause does not result in the optimizer using the index
first, it just uses the indexes in the later steps of the query plan.
- using "not indexed" still results in the same table scan of
high_volume_table first, just without the index.
- using the unary "+" on the d table filters has no effect on the query
plan (as these are not indexed in the first place I guess)

Using unlikely() on the d table filters seems to be the only option that
works.


On Tue, Jun 26, 2018 at 10:02 AM, Eric Grange  wrote:

> Hi,
>
> I am experiencing a massive performance issue on a query with a left join
> in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>   select d.key_field, count(*) nb
>   from low_volume_table b
>   join mid_volume_table c on c.key_b = b.id
>   left join high_volume_table d on d.key_c = c.id
>   where b.id >= $1
>   and b.filter1 = 0
>   and c.filter2 > 0
>   and d.filter3 > 0
>   and d.filter4 = 1
>   group by d.key_field
>   order by nb desc
>
> The filter fields on it are relatively non-discriminant (and
> non_indexed), however the key_field is indexed.
>
> The most discriminating conditions in this query are those on the
> low_volume and mid_volume tables, but the optimizer
> selects as first action:
>
> SCAN TABLE high_volume_table USING INDEX key_field_idx
>
> which leads to a huge number of iterations.
>
> If on the other hand, just one of the d filter conditions is removed, then
> the optimizer goes (like 3.22) first for
>
>SEARCH TABLE low_volume_table AS b USING COVERING INDEX
> low_volume_table_id_idx (b.filter1=? AND rowid>?)
>
> This happens after running ANALYZE, the sqlite1_stat for the high_volume
> table and key_field_idx is
>
>  5855234 6
>
> while for the  low_volume_table_filter1_idx it is
>
>  1976628 988314
>
> While the  low_volume_table_filter1_idx does not look very selective, as
> it is coupled with rowid filtering, it is actually very effective
> as it combines rowid & filter1, so there are just thousandths of rows
> being considered in the "group by", while when starting from
> a key_field_idx, there are millions of rows being considered, the
> overwhelming majority not fulfilling the conditions.
>
> The above table names and fields have been anonymized, if someone from the
> SQLite team want to have a look at the actual data,
> I can provide a database download (it's about 1.7 GB)
>
> Thanks!
>
>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Eric Grange

Hi,

I am experiencing a massive performance issue on a query with a left join
in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
milliseconds.
The problematic query looks like

  select d.key_field, count(*) nb
  from low_volume_table b
  join mid_volume_table c on c.key_b = b.id
  left join high_volume_table d on d.key_c = c.id
  where b.id >= $1
  and b.filter1 = 0
  and c.filter2 > 0
  and d.filter3 > 0
  and d.filter4 = 1
  group by d.key_field
  order by nb desc

The filter fields on it are relatively non-discriminant (and
non_indexed), however the key_field is indexed.

The most discriminating conditions in this query are those on the
low_volume and mid_volume tables, but the optimizer
selects as first action:

SCAN TABLE high_volume_table USING INDEX key_field_idx

which leads to a huge number of iterations.

If on the other hand, just one of the d filter conditions is removed, then
the optimizer goes (like 3.22) first for

   SEARCH TABLE low_volume_table AS b USING COVERING INDEX
low_volume_table_id_idx
(b.filter1=? AND rowid>?)

This happens after running ANALYZE, the sqlite1_stat for the high_volume
table and key_field_idx is

 5855234 6

while for the  low_volume_table_filter1_idx it is

 1976628 988314

While the  low_volume_table_filter1_idx does not look very selective, as it
is coupled with rowid filtering, it is actually very effective
as it combines rowid & filter1, so there are just thousandths of rows being
considered in the "group by", while when starting from
a key_field_idx, there are millions of rows being considered, the
overwhelming majority not fulfilling the conditions.

The above table names and fields have been anonymized, if someone from the
SQLite team want to have a look at the actual data,
I can provide a database download (it's about 1.7 GB)

Thanks!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How to Handle BigInt

2018-05-03 Thread Eric Grange

As quick insight I gleaned from this list sometime ago: if you only need to
be able to search and sort, the blobs can be used.

If you can live with a fixed size (f.i. 256bits), then just use the
fixed-size big-endian representation as blob content.

For variable-length big integers, encode them as variable-length
big-endian, then prepend it with its byte length encoded to a fixed size
big-endian (f.i. a dword), so encoding 123456 would be the blob
x'000301E240'  (of course if your biginteger are never going to grow
too large, you could use a word or a byte to encode the length)

SQLite will not be able to do arithmetic on those or convert them to string
or anything, but you can add custom functions to handle that format should
you need them.

Eric

On Thu, May 3, 2018 at 1:54 PM, Simon Slavin  wrote:

> On 2 May 2018, at 6:08pm, Thomas Kurz  sqlite.org> wrote:
>
> > Are there any plans for supporting a true BigInt/HugeInt data type (i.e.
> without any length restriction) in the near future?
>
> The largest integer storable as an integer is current 2^63-1, which is the
> value as signed BigInt in many libraries.  In other words, SQLite already
> does BigInt.  Just the same as SQL Server, MySQL, Postgres and DB2.
>
> I have not seen any plans for anything bigger than that.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite3_column_decltype and affinity

2018-04-02 Thread Eric Grange

I understand the type can differ between rows, but that's also the case for
a regular table, and
https://www.sqlite.org/datatype3.html#affinity_of_expressions says

   << An expression of the form "CAST(expr AS type)" has an affinity that
is the same as a column with a declared type of "type". >>

so the sql engine already has to keep track of the affinity specified
through "cast()" if I understood the above correctly

On Sun, Apr 1, 2018 at 3:19 PM, Simon Slavin <slav...@bigfraud.org> wrote:

> On 1 Apr 2018, at 8:19am, Eric Grange <egra...@glscene.org> wrote:
>
> > The decl_type allows to differentiate a datetime, or a "wide" text from a
> > single char text column
>
> The reason descl_type does not work on expressions is the reason I
> illustrated in my previous post: the expression can have a different type
> for different rows of the result and therefore has no affinity throughout
> the entire set of results.
>
> You could, perhaps, check the types of the first row of results, and
> assume that all rows will have the same types.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite3_column_decltype and affinity

2018-04-01 Thread Eric Grange

Yes, I know about column_type, but it returns only integer/text/etc.

The decl_type allows to differentiate a datetime, or a "wide" text from a
single char text column, etc. which are all useful for presentation
purposes when the fields have been declared properly.
I was hoping to be able to recover that info whe it is provide through a
cast.

On Fri, Mar 30, 2018 at 6:04 PM, Simon Slavin <slav...@bigfraud.org> wrote:

> On 30 Mar 2018, at 11:22am, Eric Grange <zar...@gmail.com> wrote:
>
> > Is there a way to have sqlite3_column_decltype return the affinity for an
> > expression ?
>
> You may be referring to
>
> sqlite3_column_type()
>
> which can be applied to columns returned by a query even if that column is
> an expression.  But if you want to do it accurately for every row it can't
> be done at the column level, because an expression has a datatype and not
> an affinity, and different rows of the same expression might have a
> different type:
>
> SELECT month,
>CASE weekday
>WHEN 6 THEN 'weekend'
>WHEN 7 THEN 'weekend'
>ELSE weekday
>END
>FROM deliverydates;
>
> To handle that properly I think you'd have to call sqlite3_value_type()
> for each value returned.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] sqlite3_column_decltype and affinity

2018-03-30 Thread Eric Grange

Hi,

Is there a way to have sqlite3_column_decltype return the affinity for an
expression ?

https://www.sqlite.org/datatype3.html#affinity_of_expressions

states that a CAST can be used to specify the affinity of a column, however
sqlite3_column_decltype does not report that affinity, as is documented in

https://www.sqlite.org/c3ref/column_decltype.html

I would like to have the affinity for purposes of presenting the results of
an SQL query (for column display width etc.).
Thanks!

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] The upcoming 3.23.0 release

2018-03-23 Thread Eric Grange

> Add the LEFT JOIN strength reduction optimization that converts a LEFT
JOIN into an ordinary JOIN

A question on this one: I have been using LEFT JOIN for many years (and not
juste in SQLIte) in cases where a JOIN could have been used as a way to
"hint" query optimizers which tables to scan first, typically in cases
where the WHERE is too complex or misleading for the optimizer (ie.
whenever index statistics paint a wrong picture, for whatever specific
reasons).
I can see that change "breaking" optimized queries back to their original
performance.

Is there a planned "QUERY PLAN" or lightweight hinting is in the works?
The current suggestions in the doc are to use cross joins or stat3/stat4
tables, but both are quite more complex to maintain, while just introducing
a "LEFT" before a "JOIN" in a particular query used to be enough :)

On Fri, Mar 23, 2018 at 1:06 AM, Olivier Mascia  wrote:

> > Le 22 mars 2018 à 20:09, Richard Hipp  a écrit :
> >
> > ...
> > Please download the latest Pre-release Snapshot
> > (https://sqlite.org/download.html) and test out the latest SQLite in
> > your applications.  Report any issues, either to this mailing list, or
> > directly to me at d...@sqlite.org.
>
> I am of course not really much concerned by this detail — yet I'm writing
> an email about it :).  Since 3.22 and now with 3.23 snapshot as of today,
> latest updates of Visual Studio 2017 report this warning:
>
> "shell.c(2377): warning C4996: 'chmod': The POSIX name for this item is
> deprecated. Instead, use the ISO C and C++ conformant name: _chmod."
>
> It occurs at two places in shell.c, and the same thing occurs once for
> unlink (_unlink).
>
> In their latest versions of the documentation about their UCRT they say:
>
> > The C++ standard reserves names that begin with an underscore in the
> global namespace to the implementation. Because the POSIX functions are in
> the global namespace, but are not part of the standard C runtime library,
> the Microsoft-specific implementations of these functions have a leading
> underscore. For portability, the UCRT also supports the default names, but
> the Visual C++ compiler issues a deprecation warning when code that uses
> them is compiled. Only the default POSIX names are deprecated, not the
> functions. To suppress the warning, define _CRT_NONSTDC_NO_WARNINGS before
> including any headers in code that uses the original POSIX names.
>
> See: https://docs.microsoft.com/en-us/cpp/c-runtime-library/compatibility
>
> Adding a #define _CRT_NONSTDC_NO_WARNINGS to shell.c might indeed be a
> nice way to go.
>
> --
> Best Regards, Meilleures salutations, Met vriendelijke groeten,
> Olivier Mascia
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

2018-01-12 Thread Eric Grange

> But if you have million rows this could involve a lot of number-shuffling
and I can see that it might not work out in the real world.

Yes, also while value field can change several times per seconds
(thousandths in some cases), it is acceptable to have the ranking be
updated at a lower frequency and display somewhat "stale" ranking & values.

> This should not be true.  You should not be using OFFSET.  Your queries
should be something like

That was with only the "value" field being indexed (so without a rank
field), is there a better way than OFFSET in that case ?


On Tue, Jan 9, 2018 at 6:21 PM, Simon Slavin <slav...@bigfraud.org> wrote:

> On 9 Jan 2018, at 11:35am, Eric Grange <egra...@glscene.org> wrote:
>
> > In both cases, since things are constantly in flux, the absolute rank and
> > neighbor do not really matter
> > (outside the top ranks), but changes in rank are being looked at.
> > i.e. having rank 155k or 154k is not really meaningful in itself, but on
> > the other hand
> > gaining 1000 "ranks" by going from 155k to 154k is what users are after.
>
> Okay, I see what you mean.  You have a special, unusual, case where the
> ranks change frequently and you have a genuine need to do things like "give
> me all the ranks from 154k to 155k".
>
> That’s a very difficult thing to do quickly.  The fastest way to do it
> would be different depending on which you do more: change ranks or query
> ranks.
>
> The cannonical SQL way would be that every time a figure changes you would
> change the ranks of all the rows between the two positions.  This would
> take only two UPDATE commands each time.  The data would be up-to-date all
> the time and therefore queries would be fast.  But if you have million rows
> this could involve a lot of number-shuffling and I can see that it might
> not work out in the real world.
>
> >> But then, if your range queries are based on a rank derived from value,
> why
> >> not index value directly? You'd still get fast range queries based on
> values, no?
> >
> > You get fast value range queries, but rank range queries become slower
> and
> > slower the farther away you get from the top rank.
>
> This should not be true.  You should not be using OFFSET.  Your queries
> should be something like
>
> SELECT * FROM ranked WHERE rank BETWEEN 154000 AND !55000 ORDER BY
> rank
>
> which should be lightning fast because you have an index on "rank".
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

2018-01-09 Thread Eric Grange

> Do you actually have a need to find the 4512nd rank ?

Yes, this is is used to display and check on "neighbors", ie. keys with
similar rank.
The other very common query is to show the rank for a given key.

In both cases, since things are constantly in flux, the absolute rank and
neighbor do not really matter
(outside the top ranks), but changes in rank are being looked at.
i.e. having rank 155k or 154k is not really meaningful in itself, but on
the other hand
gaining 1000 "ranks" by going from 155k to 154k is what users are after.

> 3) In your programming language, number the rows by doing
>
>SELECT KEY FROM RANKED ORDER BY VALUE
>
>and for each row returned doing an UPDATE for the RANK value.
>It’ll be slower, but it won’t take up the huge chunk of memory needed to
keep that index in memory.

Yes, that's what I tried, but it is indeed much slower than deleting the
whole table an inserting everything all at once,
because as soon as one key moves from top tier to bottom (or vice versa),
you have to touch basically all records,
and doing so with many updates wrecks the performance completely.

> But then, if your range queries are based on a rank derived from value,
why
> not index value directly? You'd still get fast range queries based on
values, no?

You get fast value range queries, but rank range queries become slower and
slower the farther away you get from the top rank.

Also while most end-user queries are for the top 100 / top 1000, those
results are cached and only infrequently
hit the db (so even if finding out the top 1000 was slow and inefficient,
it would not really matter).
In practice, the queries that really hit on the db are for "random" keys
far from the top 1000.

Eric

On Tue, Jan 9, 2018 at 11:44 AM, Dominique Devienne <ddevie...@gmail.com>
wrote:

> On Tue, Jan 9, 2018 at 11:26 AM, Eric Grange <egra...@glscene.org> wrote:
>
> > So the order by is used to control the insertion order, so that the RANK
> > autoinc primary key ends up with natural rank order
>
>
> But then, if your range queries are based on a rank derived from value, why
> not index value directly?
> You'd still get fast range queries based on values, no? That probably
> forces you app to maintain a
> mapping from "rank" to values, but you don't need to know all ranks, just
> the few necessary to know
> where to "page" from, no? (assuming I'm guess what you use rank for
> correctly).
>
> SQLite then maintain that index automatically, no? I'm probably missing
> something though. Sounds too simple... --DD
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

2018-01-09 Thread Eric Grange

You mean using limit / offset instead ?

Even with an index on the VALUE column, queries like

select * from ranked
order by value
limit 10 offset xxx

become very slow when xxx is great, while

select * from ranked
order by rank
where rank between xxx and xxx+9

are fast regardless of the value of xxx

Similarly finding the rank of a key becomes sluggish for keys that are not
in the top without

So the order by is used to control the insertion order, so that the RANK
autoinc primary key ends up with natural rank order



On Tue, Jan 9, 2018 at 10:59 AM, Simon Slavin <slav...@bigfraud.org> wrote:

> On 9 Jan 2018, at 9:50am, Eric Grange <zar...@gmail.com> wrote:
>
> > then I fill that table with something like
> >
> >   INSERT INTO RANKED
> >  SELECT key, value
> >  FROM ...something rather complex and big...
> >  ORDER BY value desc
> >
> > This works well enough, but as the amount of values to be ranked
> increases,
> > this feels wasteful to delete everything and then re-insert
> > everything just to adjust the RANK column, also I am running into memory
> > issues as the ORDER BY requires a temporary b-tree
> > which runs into the gigabyte range in some instances.
>
> The ORDER BY clause serves no useful value here.  Leave it out.  Do your
> sorting when you query the table.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

2018-01-09 Thread Eric Grange

Hi,

I have a problem where I have a large set of (key, value) which I want to
sort by value, and then store in a table having (rank, key, value) fields,
so that for a given key I can quickly find the rank, or for a given rank
range, I can quickly list the keys & values.

Since there is no ROW_NUMBER() function, but there is an autoincrement
feature, and the rank are numbered 1, 2, 3 etc. the strategy I have
been using is to create ranked table like

   CREATE RANKED (
  RANK INTEGER PRIMARY KEY AUTOINCREMENT,
  KEY INTEGER,
  VALUE FLOAT
   )

(+ an index for the key)

and then I fill that table with something like

   INSERT INTO RANKED
  SELECT key, value
  FROM ...something rather complex and big...
  ORDER BY value desc

This works well enough, but as the amount of values to be ranked increases,
this feels wasteful to delete everything and then re-insert
everything just to adjust the RANK column, also I am running into memory
issues as the ORDER BY requires a temporary b-tree
which runs into the gigabyte range in some instances.

I have ways to maintain the KEY and VALUES individually and incrementally,
but approaches I have tried to maintain the RANK
with UPDATE queries ran much slower than deleting and recreating
everything, though this could just be bad implementations
from my part.

Are there any other strategies I could use that could update just the RANK
field and mitigate the temporary B-tree size?

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-11 Thread Eric Grange

> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE
DS = 15;".

Not zipvfs specific, but that kind of update can be quite inefficient if
the record is large (in terms of bytes, or records per database page) as
your table declaration hints to.

This will be especially prominent with compression, as a single byte
changed early in the uncompressed stream can lead to changing pretty much
every byte of the compressed data, and so is bound to defeat any
optimizations the storage could have to handle unmodified pages and data.

Depending on your update patterns and frequencies, it could thus be
beneficial to split your table into two tables (or more) with a common
primary key:
- one table for the rarely changed fields (even better if they are
immutable)
- one table (or more) for the fields that change often

This will increase the database size and reduce insert performance, and it
can complicate querying when you need everything (extra inner joins), but
it can help reduce the amount of raw data that is affected by updates quite
drastically.

I have been able to use the above strategy sucessfully, however it was
under Windows compressed folders, so YMMV.

On Thu, Sep 7, 2017 at 9:13 PM, Yue Wu  wrote:

> Hello,
> As mentioned in the subject, our goal is to improve performance regarding
> to batch sql updates.
> The update sql as follow,
>
> > UPDATE ITEM SET FIELD4 =? WHERE DS=?
>
> We run 100,000 updates in a single transaction. The zipvfs version takes
> about 20 min while uncompressed version takes about 7 min.
> I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
> = 15;".
>
> Any suggestions is very helpful.
>
> Some background:
> We have an app running on Android 4.2 using zipvfs.
> The page size is 4096
> The cache size is - 2000
>
> Table schema for ITEM
>
> > CREATE TABLE ITEM (
> > FIELD0 NUMERIC,
> > FIELD1 NUMERIC,
> > DS TEXT,
> > FIELD2 TEXT,
> > FIELD3 TEXT,
> > FIELD4 NUMERIC,
> > FIELD5 NUMERIC,
> > FIELD6 NUMERIC,
> > FIELD7 NUMERIC,
> > FIELD8 NUMERIC,
> > FIELD9 NUMERIC,
> > FIELD10 NUMERIC,
> > FIELD11 TEXT);
>
>
> The third column: "DS" is what we query by almost all the time. We also
> created index:
>
> > CREATE INDEX DS_INDEX ON ITEM(DS);
>
>
> There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
> one shipped with Android 4.2) size of 1.39gb.
> Zipvfs db using zlib and aes128, which are default.
>
> Thanks
> --
>
> Yue Wu |  Android Developer
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Proof that a line has been modified

2017-09-08 Thread Eric Grange

For complete tamper-proofness, you need "something" external to the system,
ie. not in the database, not in the code, and not on the server:
- if you use signatures, then those signatures should be made by a secure
signing service or device
- if you use hashes (be it a merkle tree, a blockchain, or a DAG), then
top-level hashes need to be stored (or signed) on another system (and
verification be made against those)

Note that simple signing of the individual rows would not guard you against
deletions, and if not performed by something external, the signature would
provide no date/time guarantee (ie. even if the falsifier cannot access the
private key, if the attacker can change the signing system time, the
attacker could trick the system into creating "historical" records, thus
greatly simplifying tampering).

To guard against deletions you need hashes at a higher level than the row,
a blockchain or DAG (like git) being the simplest solutions, ie. the hash
of a row is the hmac of (data of that row + hash of previous row),
storing/signing the last row hash externally then allows verifying all
previous rows.

Eric


On Fri, Sep 8, 2017 at 1:56 AM, Michael Stephenson 
wrote:

> In the past, I've used the pager to secure data.  This involved encrypting
> the data before writing the data to disk and decrypting when loading from
> disk but also optionally hashing the page and storing the hash in extra
> data reserved for each page.  If anyone tampered with the data, the hash
> would indicate this and an error could be thrown.
>
> Also encrypting the page data makes it more difficult to tamper with the
> data.
>
> Products like sqlcipher do things like this (encryption, hashing), and
> it's fairly easy to see how it's done by pulling the sqlite source (not the
> amalgamation) and diffing it with the sqlcipher source.
>
> ~Mike
>
> > On Sep 7, 2017, at 6:34 PM, Jens Alfke  wrote:
> >
> >
> >
> >> On Sep 7, 2017, at 2:47 PM, Keith Medcalf  wrote:
> >>
> >> Again, this is a detection for changed data and does nothing to prevent
> changes being made.
> >
> > The OP did not require that it be impossible to make changes (which is
> clearly impossible without locking down write access to the file.) He
> specifically said that detection of changed data was OK:
> >
> >>> For security reasons, a customer wants to be sure that a database line
> cannot be modified after its initial insertion (or unmodified without being
> visible, with proof that the line has been modified).
> >
> > The procedures I described provide detection that a row has been
> modified.  The first one doesn't make it evident that a row has been
> deleted, though the second one does.
> >
> > —Jens
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Eric Grange

The main problem with multiple writers would be in conflict resolution,
locking and deadlocks.

Imagine a simple accounting transactions "update accounts set value = value
+ 1 where ..." if run at the same time from 2 threads (or close enough),
then if you do not have conflict resolution in place you may end up
increase value by only 1

1) alice starts transaction
2) bob starts transaction
3) alice reads value, sees 10
4) bob reads value, sees 10
5) alice writes 10+1
6) bob writes 10+1 ooops

And once you have more than one query, things become even more complicated.

Even among full relational DBs, few manage these correctly. IME only
PostgreSQL and Firebird handle these correctly by default, for Oracle or
MSSQL you have to use special locking modes and transaction options with
significant performance penalties.

Eric

On Fri, Aug 4, 2017 at 8:50 AM, Luc DAVID  wrote:

> Hello,
>
> I was thinking about a possible solution for sqlite "only single writer is
> allowed at the same time" and database lock.
>
> sqlite has WAL mode for better concurrency and this could maybe be used to
> extend the number of writters:
>
> Do you think it would be possible to create a
> MyDb.WAL001...MyDb.WAL.002...MyDb.WAL.nnn when a write operation is
> currently running in order to allow more writers?
>
> The sqlite engine would then take care of dealing with all the WAL files
> when reading data, backup...etc
>
> The maximum of allowed writers could be set by a pragma or another mean
> (when opening the db)
>
> It seems a simply way to boost sqlite concurrency.
>
> Am I wrong on this point ?
>
> Best Regards
>
> Luc
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Version 3.20.0 coming soon...

2017-07-14 Thread Eric Grange

> Could I suggest that the STMT virtual table be renamed as something like
sqlite_statements ?
> Or anything else that starts with "sqlite_" ?

Seconded, especially as more and more virtual tables become available (a
good thing), there is a greater risk of name collision, so an "sqlite_"
namespace would be nice.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

2017-06-29 Thread Eric Grange

> The sender, however, could be lying, and this needs to be considered

This is an orthogonal problem: if the sender is sending you data that is
not what it should be, then he could just as well be sending you
well-encoded and well-formed but invalid data, or malware, or
confidential/personal data you are not legally allowed to store, or, or,
or... the list never ends.

And generally speaking, if your code tries too hard to find a possible
interpretation for invalid of malformed input, then you are far more likely
to just end up with processed garbage, which will make it even harder to
figure out down the road where the garbage in your database originated from
(incorrect input? bug in the heuristics? etc.)




On Wed, Jun 28, 2017 at 10:40 PM, Tim Streater <t...@clothears.org.uk> wrote:

> On 28 Jun 2017 at 14:20, Rowan Worth <row...@dug.com> wrote:
>
> > On 27 June 2017 at 18:42, Eric Grange <egra...@glscene.org> wrote:
> >
> >> So while in theory all the scenarios you describe are interesting, in
> >> practice seeing an utf-8 BOM provides an extremely
> >> high likeliness that a file will indeed be utf-8. Not always, but a
> memory
> >> chip could also be hit by a cosmic ray.
> >>
> >> Conversely the absence of an utf-8 BOM means a high probability of
> >> "something undetermined": ANSI or BOMless utf-8,
> >> or something more oddball (in which I lump utf-16 btw)... and the need
> for
> >> heuristics to kick in.
> >>
> >
> > I think we are largely in agreement here (esp. wrt utf-16 being an
> oddball
> > interchange format).
> >
> > It doesn't answer my question though, ie. what advantage the BOM tag
> > provides compared to assuming utf-8 from the outset. Yes if you see a
> utf-8
> > BOM you have immediate confidence that the data is utf-8 encoded, but
> what
> > have you lost if you start with [fake] confidence and treat the data as
> > utf-8 until proven otherwise?
>
> 1) Whether the data contained in a file is to be considered UTF-8 or not
> is an item of metadata about the file. As such, it has no business being
> part of the file itself. BOMs should therefore be deprecated.
>
> 2) I may receive data as part of an email, with a header such as:
>
>Content-type: text/plain; charset="utf-8"
>Content-Transfer-Encoding:  base64
>
> then I interpret that to mean that the attendant data, after decoding from
> base64, is it to be expected to be utf-8. The sender, however, could be
> lying, and this needs to be considered. Just because a header, or file
> metadata, or indeed a BOM, says some data or other is legal utf-8, this
> does not mean that it actually is.
>
>
> --
> Cheers  --  Tim
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

2017-06-27 Thread Eric Grange

> ASCII / ANSI is a 7-bit format.

ASCII is a 7 bit encoding, but uses 8 bits in just about any implementation
out there. I do not think there is any 7 bit implementation still alive
outside of legacy mode for low-level wire protocols (RS232 etc.). I
personally have never encountered a 7 bit ASCII file (as in bitpacked), I
am curious if any exists?

ANSI has no precise definition, it's used to lump together all the <= 8 bit
legacy encodings (cf. https://en.wikipedia.org/wiki/ANSI_character_set)

On Tue, Jun 27, 2017 at 1:53 PM, Simon Slavin  wrote:

>
>
> On 27 Jun 2017, at 7:12am, Rowan Worth  wrote:
>
> > In fact using this assumption we could dispense with the BOM entirely for
> > UTF-8 and drop case 5 from the list.
>
> If you do that, you will try to process the BOM at the beginning of a
> UTF-8 stream as if it is characters.
>
> > So my question is, what advantage does
> > a BOM offer for UTF-8? What other cases can we identify with the
> > information it provides?
>
> Suppose your software processes only UTF-8 files, but someone feeds it a
> file which begins with FE FF.  Your software should recognise this and
> reject the file, telling the user/programmer that it can’t process it
> because it’s in the wrong encoding.
>
> Processing BOMs is part of the work you have to do to make your software
> Unicode-aware.  Without it, your documentation should state that your
> software handles the one flavour of Unicode it handles, not Unicode in
> general.  There’s nothing wrong with this, if it’s all the programmer/user
> needs, as long as it’s correctly documented.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

2017-06-27 Thread Eric Grange

> In case 7 we have little choice but to invoke heuristics or defer to the
> user, yes?

Yes in theory, but "no" in the real world, or rather "not in any way that
matters"

In the real world, text files are heavily skewed towards 8 bit formats,
meaning just three cases dominate the debate:
- ASCII / ANSI
- utf-8 with BOM
- utf-8 without BOM

And further, the overwhelming majority of text content are likely to
involve ASCII at the beginning (from various markups,
think html, xml, json, source code... even csv, because of explicit
separator specification or 1st column name).

So while in theory all the scenarios you describe are interesting, in
practice seeing an utf-8 BOM provides an extremely
high likeliness that a file will indeed be utf-8. Not always, but a memory
chip could also be hit by a cosmic ray.

Conversely the absence of an utf-8 BOM means a high probability of
"something undetermined": ANSI or BOMless utf-8,
or something more oddball (in which I lump utf-16 btw)... and the need for
heuristics to kick in.

Outside of source code and Linux config files, BOMless utf-8 are certainly
not the most frequent text files, ANSI and
other various encodings dominate, because most non-ASCII text files were
(are) produced under DOS or Windows,
where notepad and friends use ANSI by default f.i.

That may not be a desirable or happy situation, but that is the situation
we have to deal with.

It is also the reason why 20 years later the utf-8 BOM is still in use: it
explicit and has a practical success rate higher
than any of the heuristics, while the collisions of the BOM with actual
ANSI (or other) text start are unheard of.

On Tue, Jun 27, 2017 at 10:34 AM, Robert Hairgrove 
wrote:

> On Tue, 2017-06-27 at 01:14 -0600, Scott Robison wrote:
> > The original issue was two of the largest companies in the world
> > output the
> > Byte Encoding Mark(TM)(Patent Pending) (or BOM) at the beginning of
> > UTF-8
> > encoded text streams, and it would be friendly for the SQLite3 shell
> > to
> > skip it or use it for encoding identification in at least some cases.
>
> I would suggest adding a command-line argument to the shell indicating
> whether to ignore a BOM or not, possibly requiring specification of a
> certain encoding or list of encodings to consider.
>
> Certainly this should not be a requirement for the library per se, but
> a responsibility of the client to provide data in the proper encoding.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Eric Grange

>Easily solved by never including a superflous BOM in UTF-8 text

And that easy option has worked beautifully for 20 years... not.

Yes, BOM is a misnommer, yes it "wastes" 3 bytes, but in the real world
"text files" have a variety of encodings.
No BOM = you have to fire a whole suite of heuristics or present the user
with choices he/she will not understand.

After 20 years, the choice is between doing the best in an imperfect world,
or perpetuating the issue and blaming others.


On Mon, Jun 26, 2017 at 12:05 PM, Rowan Worth  wrote:

> On 26 June 2017 at 16:55, Scott Robison  wrote:
>
> > Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither
> > is dialing a cell phone. Language evolves.
> >
>
> It's not descriptive in the slightest because UTF-8's byte order is
> *specified by the encoding*.
>
>  I'm not advocating one way or
> > another, but if a system strips U+FEFF from a text stream after using it
> to
> > determine the encoding, surely it is reasonable to expect that for all
> > supported encodings.
> >
>
> ?? Are you going to strip 0xFE 0xFF from the front of my iso8859-1 encoded
> stream and drop my beautiful smiley? þÿ
> Different encodings demand different treatment. BOM is an artifact of
> 16/32-bit unicode encodings and can kindly keep its nose out of [the
> relatively elegant] UTF-8.
>
> -Rowan
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8-BOM not disregarded in CSV import

2017-06-26 Thread Eric Grange

Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8.

Making it optional or non-necessary basically made every single text file
ambiguous, with non-trivial heuristics and implicit conventions required
instead, resulting in character corruptions that are neither acceptable nor
understood by users.
Making it compulsory would have made pre-Unicode *nix command-line
utilities and C string code in need of fixing, much pain, sure, but in
retrospect, this would have been a much smarter choice as everything could
have been settled in matter of years.

But now, more than 20 years later, UTF-8 storage is still a mess, with no
end in sight :/


On Sun, Jun 25, 2017 at 9:16 PM, Cezary H. Noweta 
wrote:

> Hello,
>
> On 2017-06-23 22:12, Mahmoud Al-Qudsi wrote:
>
>> I think you and I are on the same page here, Clemens? I abhor the
>> BOM, but the question is whether or not SQLite will cater to the fact
>> that the bigger names in the industry appear hell-bent on shoving it
>> in users’ documents by default.
>>
>
> Given that ‘.import’ and ‘.mode csv’ are “user mode” commands,
>> perhaps leeway can be shown in breaking with standards for the sake
>> of compatibility and sanity?
>>
>
> IMHO, this is not a good way to show a leeway. The Unicode Standard has
> enough bad things in itself. It is not necessary to transform a good
> Unicode's thing into a bad one.
>
> Should SQLite disregard one  sequence, or all 
> sequences, or at most 2, 3, 10 ones at the beginning of a file? Such
> stream can be produced by a sequence of conversions done by a mix of
> conforming and ``breaking the standard for the sake of compatibility''
> converters.
>
> To be clear: I understand your point very well - ``let's ignore optional
> BOM at the beginning'', but I want to show that there is no limit in
> such thinking. Why one optional? You have not pointed out what
> compatibility with. The next step is to ignore N BOMs for the sake of
> compatibility with breaking the standard for the sake of compatibility
> with breaking the standard for the sake of... lim = \infty. I cannot see
> any sanity here.
>
> The standard says: ``Only UTF-16/32 (even not UTF-16/32LE/BE) encoding
> forms can contain BOM''. Let's conform to this.
>
> Certainly, there are no objections to extend an import's functionality
> in such a way that it ignores the initial 0xFEFF. However, an import
> should allow ZWNBSP as the first character, in its basic form, to be
> conforming to the standard.
>
> -- best regards
>
> Cezary H. Noweta
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Questions about SQLite Encryption Extension (SEE)

2017-06-09 Thread Eric Grange

> Isn't it all just obfuscation?

Not really, the encryption protects the file, wherever it is, as long as
the attacker does not have access to the application keys or application
memory.

> If the adversary is another process on the same host, encrypting the db
> just adds obfuscation, which is security against lazy hackers.

Another process would need debug privileges to access your application's
memory.

If you rely on disk encryption primarily, then if that encryption
compromised, or if backups are compromised, or if a root user copies the
wrong files in the wrong places, or just makes any error, then everything
on that disk can be compromised.

With application-level encryption, user error will only compromise that
app's data, and you otherwise need the root user to be the attacker, which
makes the problem quite different from the root user making a mistake.

Finally in the grand scheme of things, the likelyhood of any disk
encryption being broken (as an implementation) is extremely high, given it
is such a juicy target. And when it is broken, automated tools will be
available for all lazy hackers to download and deploy with a single click.

So while you can and should use disk encryption, it can only be seen as an
added security layer, never as a primary security layer.

Eric

On Fri, Jun 9, 2017 at 12:13 AM, Wout Mertens 
wrote:

> Isn't it all just obfuscation? Any root user can read your key, if not from
> disk then from memory. Any normal user can't read your key, nor from disk,
> nor from memory; and they can't read your db file either.
>
> So if the adversary is someone with access to your disk image, disk
> encryption trumps db encryption (unless the disk encryption is vulnerable
> to known-plaintext attacks, but I guess they probably apply to sqlite too).
>
> If the adversary is another process on the same host, encrypting the db
> just adds obfuscation, which is security against lazy hackers.
>
> On Thu, Jun 8, 2017 at 9:04 PM Richard Hipp  wrote:
>
> > On 6/8/17, Wout Mertens  wrote:
> > > Just musing: is an encrypted disk not more reliable? You have to store
> > the
> > > key somewhere…
> >
> > Maybe.  I guess it depends on your threat model.
> >
> > Encrypting the whole disk is a system setting,.  Anybody who has
> > access to the system can see everything on disk.  You also have to
> > have administrator privileges to set it up.
> >
> > Encrypting a single database file is an application setting.  Some
> > applications might want to hide there data from other applications on
> > the same system, or from the user of the system.  Whole disk
> > encryption won't help there.  And, database encryption requires no
> > special privileges.
> >
> > --
> > D. Richard Hipp
> > d...@sqlite.org
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] RIGHT JOIN! still not supported?

2017-03-22 Thread Eric Grange

For the sake of curiosity, is anyone (as in any human) using RIGHT JOIN?

Personally I never had a need for a RIGHT JOIN, not because of theoretical
or design considerations, but it just never came into my flow of thought
when writing SQL...

I guess some automated SQL query generators could use it though, because
they do not have a "flow of thought".

On Tue, Mar 21, 2017 at 9:50 PM, Darren Duncan 
wrote:

> What benefit does a RIGHT JOIN give over a LEFT JOIN?  What queries are
> more natural to write using the first rather than the second?
>
> While I can understand arguments based on simple mirror parity, eg we have
> < so we should have > too, lots of other operations don't have mirror
> syntax either.
>
> -- Darren Duncan
>
> On 2017-03-21 8:42 AM, Daniel Kamil Kozar wrote:
>
>> Seeing how SQLite was created in 2000, it seems like nobody really
>> needed this feature for the last 17 years enough in order to actually
>> implement it.
>>
>> Last I heard, patches are welcome on this mailing list. Don't keep us
>> waiting.
>>
>> Kind regards,
>> Daniel
>>
>> On 20 March 2017 at 21:09, PICCORO McKAY Lenz 
>> wrote:
>>
>>> i got this
>>>
>>> Query Error: RIGHT and FULL OUTER JOINs are not currently supported
>>> Unable to execute statement
>>>
>>> still today in 21 ts century?
>>>
>>> Lenz McKAY Gerardo (PICCORO)
>>> http://qgqlochekone.blogspot.com
>>>
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] More built-in functions for basic math

2017-03-09 Thread Eric Grange

A bonus of having them defined in the core is that it avoids the minor
inconsistencies that are bound to arise in custom implementations (starting
with the name of math functions)

Main downside is probably not going to be the size, but that it reserves
more names, and may conflict with existing custom implementations.

Eric

On Thu, Mar 9, 2017 at 1:16 PM, R Smith  wrote:

> I second this - Been having a hard time making basic queries with a simple
> x^y function in SQL for SQLite since there is no guarantee what the
> end-user's system will have it compiled-in. I can version-check or
> version-enforce easily, but compile-option check or enforce is a no-go.
>
> If we can shift the basic "Auto-included" feature set a few notches up, we
> can still have hardcore minimalist users compile their own (as they
> probably already do), but it would be nice to know a query running on a
> standard linux or Apple OS on the included SQLite will support some wider
> functions as a rule[1] without having to keep track. I realize this will
> take a time to permeate through the world, but it would be great to start
> asap.
>
> [1] Yes, there are threads on this same forum where I myself kicked
> against bloating SQLite with unneeded functionality as a rule, but perhaps
> the definition of "needed" needs revisiting. I think good math and string
> functions certainly qualify.
>
> Cheers,
> Ryan
>
>
>
> On 2017/03/09 11:45 AM, Dominique Devienne wrote:
>
>> I find that I'm often missing basic mathematical functions in the default
>> shell.
>> Many SQLite clients add many, but given that the official SQLite shell
>> misses
>> them you can't use them in views for predefined "reports" within the DB
>> file itself, for example.
>>
>> There's [1] which is 50KB, but only a tiny part of that is for math
>> functions, so math functions are only a few KBs away.
>>
>> Adding basic math functions and stddev, median, variance, etc... wouldn't
>> add much,
>> and they could be added to the shell at least, if deemed too big for the
>> amalgamation,
>> but given that many things can be turned on/off in the amalgamation, that
>> would be just
>> one more IMHO.
>>
>> The goal here would be to move the "minimum expectations" of what can be
>> done with the official shell, out-of-the-box, w/o the need to resort to
>> .load of an extension which is not readily available in compiled form for
>> many non-programmer users.
>>
>> And IMHO, the ability to use math functions in views is why "moving the
>> baseline" is necessary,
>> since without those being built-in, the views will just error out in the
>> official shell.
>>
>> My $0.02, despite the upcoming chorus about lite or do-it-in-your-app
>> naysayers. Thanks, --DD
>>
>> PS: Sure SQLite's primary use case is as an *embedded* DB, so the host-app
>> can add
>> whatever it wants/needs in terms of UDFs, but I also think the
>> "standalone"
>> use of SQLite independently of the app that generated the DB file is
>> important, and we should raise the
>> bar of the minimum number of built-in functions, starting with the
>> official
>> shell.
>>
>> [1] https://www.sqlite.org/contrib/download/extension-functions.c?get=25
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite 3.18.0 pre-release snapshot

2017-03-07 Thread Eric Grange

Very interesting development, thanks for pushing the boundaries at each new
release!

Would it be possible to consider some form of deferred optimize?
ie. rather than optimize when closing the connection, it would just write
the optimize info gathered during the heavy queries, for use in a future
optimize.

The rational is that heavy queries can come over several hours (and
multiple DB connections). During those "rush hours" running an ANALYZE
could be quite detrimental when the databases are large (even for read-only
queries, it would affect the service performance through I/O usage), and in
my particular usage case, somewhat less efficient queries are less of a
problem than stalling the service.

Once the rush hours have passed, the optimize would be run with the
previously collected data (I currently run a regular ANALYZE outside of
rush hours, but I gather it is only rarely beneficial).

Eric

On Tue, Mar 7, 2017 at 4:22 AM, jose isaias cabrera 
wrote:

>
> Richard Hipp wrote...
>
> On 3/6/17, jose isaias cabrera  wrote:
>
>>
>> Richard Hipp wrote...
>>
>> Have you ever wondered when you should run ANALYZE on an SQLite
>>> database?  It is tricky to figure out when that is appropriate.  The
>>>
>> Thanks for this.  I actually run this ANALYZE weekly with a script.  This
>> will be better.  I can run it everyday don't do any harm.  Thanks.
>>
>
> Thanks for letting me know.  I don't know if this applies in your case
>> or not, but reading your note made me realize that the documentation
>> might be misleading and/or unclear as written.
>>
>
> The "PRAGMA optimize" command should be run from the same database
>> connection that is doing the heavy queries.  The reason for this is
>> that the database connection remembers (in RAM) specifically which
>> tables and indexes it has considered for use and will only run ANALYZE
>> on those tables for which some prior query would have benefited from
>> having good sqlite_stat1 numbers during the current session.  That is
>> why "PRAGMA optimize" should be run as the database connection is
>> closing, rather than when it is first opened.
>>
>
> So it is not (currently) helpful to run "PRAGMA optimize" from a
>> separate connection, or a connection that is mostly idle.  It needs to
>> be the connection that is actually doing the interesting queries so
>> that SQLite can know which tables need to be analyzed.
>>
>
> This is exactly how it's going to be used...
>
> josé
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite 3.16.0 enters testing

2017-01-01 Thread Eric Grange

> If this is true, then perf is entirely unsuitable for
> microoptimization, since microoptimization depends on having
> reproducible results.

Reproducibility can be misleading however if it is achieved by simulated
CPU cache and instruction pipelines, as this could lead to favoring the
wrong microoptimizations.

Perf stat variability (and other "real" measurements) can be minimized by
making multiple runs and taking the minimum, but also by keeping an eye on
the variance of realtime runs.

A high variance is often the sign that either the host machine is busy
(which can be dealt with seperately) or that the code is relying on
behaviors whose performance is "fragile", such as kernel context switches,
thread switches, mutexes, interrupts, I/O "coincidences" or even race
conditions.

For instance a spinlock may never exceed its spin count to enter sleep/wait
states in a simulated, deterministic environment, but could occasionnally
do in a more realistic setting, which could lead to a very different
performance profile when that happens, and ultimately favor different
optimization strategies ("slower but more robust").

Eric

On Fri, Dec 30, 2016 at 12:03 AM, Dominique Pellé  wrote:

> Bob Friesenhahn  wrote:
>
> > On Thu, 29 Dec 2016, Darko Volaric wrote:
> >
> >> What are you basing that theory on?
> >
> >
> > Perf is claimed to provide very good results but they are real results
> based
> > on real measurements.  Due to this, the measured results are very
> different
> > for the first time the program is executed and the second time it is
> > executed.  Any other factor on the machine would impact perf results.
> >
> > It seems that cachegrind produces absolutely consistent results which do
> not
> > depend on I/O, multi-core, or VM artifacts.
>
> You're right. Consistency can matter more where measuring
> small improvements that add up.
>
> I just tried "valgrind --tool=cachegrind ..." and "perf stat ..."
> with the same command. Valgrind result was more indeed
> more consistent across multiple runs.
>
> Regarding speed of measurement, the same command
> took 13.8 sec with cachegrind vs only 0.28 sec with "perf stat"
> and 0.27 sec with neither cachegrind nor perf stat. So
> perf stat has almost no overhead whereas cachegrind has a
> big overhead, making it impractical when measuring slow
> commands.
>
> Dominique
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Typical suffixes of sqlite database files

2016-10-21 Thread Eric Grange

> What suffixes to sqlite database files do you use or see in the wild?

.sql3
.dat

On Fri, Oct 21, 2016 at 7:46 AM, Jean-Christophe Deschamps  wrote:

> At 01:17 21/10/2016, you wrote:
>
> What suffixes to sqlite database files do you use or see in the wild?
>>
>
> I routinely/always use .sq3 but that's only a personal convention.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to backup a "live" database

2016-10-05 Thread Eric Grange

> But yes, if you have an application which uses only the backup API it
should be safe.

I have added SQLITE_OPEN_EXCLUSIVE to the flags of the backup db. That
should cover accidental cases where the backup db is opened before or
during the backup by "something else" ? It should never happen in my case,
but probably best to leave it in case of copy-pasta later.

> A backup simply copies all pages.  It accesses each page once, so all cache
lookups will fail.

Yes, but a 20% performance hit for a 2000 entries cache semed a bit steep,
given that a backup should be I/O bound.

After looking with a profiler, it seems that the extra time spent seems in
ntdll.dll, not SQLite itself. The source cache_size has an impact but
small, the main impact is for the destination cache_size.

Also of note, the backup operation can take about half of a CPU core (a
fast E3 core at 3.5 GHZ).

Eric

On Wed, Oct 5, 2016 at 12:50 PM, Clemens Ladisch <clem...@ladisch.de> wrote:

> Eric Grange wrote:
> > Ran some tests with variations of Clemens's backup exemple, and adding
> >
> > sqlite3_exec(src, "PRAGMA cache_size=1", NULL, NULL, NULL);
> > sqlite3_exec(dst, "PRAGMA cache_size=1", NULL, NULL, NULL);
> >
> > seems to provide the best performance [...]
> >
> > While the effect of synchronous=OFF on the destination backup db was
> > something I expected, the cache_size of 1 was not.
> >
> > Using large cache_size had a detrimental effect, both on source or
> > destination, between cache_size 1 and the default of SQLite (-2000) there
> > is a 20% difference in performance during backup.
>
> A backup simply copies all pages.  It accesses each page once, so all
> cache lookups will fail.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to backup a "live" database

2016-10-05 Thread Eric Grange

Ran some tests with variations of Clemens's backup exemple, and adding

sqlite3_exec(src, "PRAGMA cache_size=1", NULL, NULL, NULL);

sqlite3_exec(dst, "PRAGMA synchronous=OFF", NULL, NULL, NULL);
sqlite3_exec(dst, "PRAGMA cache_size=1", NULL, NULL, NULL);

seems to provide the best performance, which is about 4 to 5 times faster
than using ".backup" in the CLI.

While the effect of synchronous=OFF on the destination backup db was
something I expected, the cache_size of 1 was not.

Using large cache_size had a detrimental effect, both on source or
destination, between cache_size 1 and the default of SQLite (-2000) there
is a 20% difference in performance during backup.

Note that this is in Windows, on SSD and with lots of free RAM, so it seems
that the OS is better at handling cache than SQLite in that particular use
case.

Eric



On Wed, Oct 5, 2016 at 8:42 AM, Eric Grange <zar...@gmail.com> wrote:

> Thanks for the tool Clemens!
>
> > No, this is what makes the backup restart.  With a step size of -1, it
> would never restart.
>
> Hmm, given WAL mode, this brings another question: why the sqlite CLI does
> not do that but uses 100 pages?
> Is using a backup_step N parameter only useful if you want to provide some
> progress indicator or pace the backup? (ie. pace I/O bandwidth ?)
>
> Or in other words, given WAL mode, what do we "lose" when using -1 and to
> copy everything at once during a backup?
>
>
> On Tue, Oct 4, 2016 at 5:33 PM, Clemens Ladisch <clem...@ladisch.de>
> wrote:
>
>> Eric Grange wrote:
>> > If all else fail, I could also suspend DB writes during backups
>> (suspending
>> > DB reads would be more problematic).
>>
>> With WAL, the backup reader does not block writers.
>>
>> >> Use the backup API, and copy everything in one step.
>> >> (The restart-on-write feature should not be necessary with WAL.)
>> >
>> > That was what I thought initially, but I can only explain the
>> multi-hours
>> > backups with it: usually the backup API takes 4-5 minutes. It is just
>> once
>> > in a while that a very long backup occurs.
>> >
>> >> It calls sqlite3_backup_step() with a size of 100 pages.
>> >
>> > Ok, so I guess the huge cache is overkill with the default CLI!
>>
>> No, this is what makes the backup restart.  With a step size of -1,
>> it would never restart.
>>
>> Use a tool like the one below to do the backup in one step, without
>> restarts.
>>
>>
>> Regards,
>> Clemens
>> --
>>
>> #include 
>> #include 
>>
>> int main(int argc, char *argv[])
>> {
>> sqlite3 *src = NULL;
>> sqlite3 *dst = NULL;
>> sqlite3_backup *backup;
>> int rc;
>> int ok = 0;
>>
>> if (argc != 3) {
>> fputs("I want two file names: source, destination\n",
>> stderr);
>> goto error;
>> }
>>
>> rc = sqlite3_open_v2(argv[1], , SQLITE_OPEN_READONLY, NULL);
>> if (rc != SQLITE_OK) {
>> fprintf(stderr, "Cannot open %s: %s\n", argv[1],
>> sqlite3_errmsg(src));
>> goto error;
>> }
>>
>> rc = sqlite3_open_v2(argv[2], , SQLITE_OPEN_READWRITE |
>> SQLITE_OPEN_CREATE, NULL);
>> if (rc != SQLITE_OK) {
>> fprintf(stderr, "Cannot open %s: %s\n", argv[2],
>> sqlite3_errmsg(dst));
>> goto error;
>> }
>>
>> sqlite3_exec(src, "PRAGMA busy_timeout=1", NULL, NULL, NULL);
>> sqlite3_exec(dst, "PRAGMA busy_timeout=1", NULL, NULL, NULL);
>>
>> backup = sqlite3_backup_init(dst, "main", src, "main");
>> if (backup == NULL) {
>> fprintf(stderr, "Cannot initialize backup: %s\n",
>> sqlite3_errmsg(dst));
>> goto error;
>> }
>>
>> do {
>> rc = sqlite3_backup_step(backup, -1);
>> } while (rc == SQLITE_BUSY || rc == SQLITE_LOCKED);
>>
>> rc = sqlite3_backup_finish(backup);
>> if (rc == SQLITE_OK) {
>> ok = 1;
>> } else {
>> fprintf(stderr, "Backup failed: %s\n",
>> sqlite3_errmsg(dst));
>> }
>>
>> error:
>> sqlite3_close(dst);
>> sqlite3_close(src);
>>
>> return ok ? 0 : 1;
>> }
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to backup a "live" database

2016-10-05 Thread Eric Grange

Thanks for the tool Clemens!

> No, this is what makes the backup restart.  With a step size of -1, it
would never restart.

Hmm, given WAL mode, this brings another question: why the sqlite CLI does
not do that but uses 100 pages?
Is using a backup_step N parameter only useful if you want to provide some
progress indicator or pace the backup? (ie. pace I/O bandwidth ?)

Or in other words, given WAL mode, what do we "lose" when using -1 and to
copy everything at once during a backup?


On Tue, Oct 4, 2016 at 5:33 PM, Clemens Ladisch <clem...@ladisch.de> wrote:

> Eric Grange wrote:
> > If all else fail, I could also suspend DB writes during backups
> (suspending
> > DB reads would be more problematic).
>
> With WAL, the backup reader does not block writers.
>
> >> Use the backup API, and copy everything in one step.
> >> (The restart-on-write feature should not be necessary with WAL.)
> >
> > That was what I thought initially, but I can only explain the multi-hours
> > backups with it: usually the backup API takes 4-5 minutes. It is just
> once
> > in a while that a very long backup occurs.
> >
> >> It calls sqlite3_backup_step() with a size of 100 pages.
> >
> > Ok, so I guess the huge cache is overkill with the default CLI!
>
> No, this is what makes the backup restart.  With a step size of -1,
> it would never restart.
>
> Use a tool like the one below to do the backup in one step, without
> restarts.
>
>
> Regards,
> Clemens
> --
>
> #include 
> #include 
>
> int main(int argc, char *argv[])
> {
> sqlite3 *src = NULL;
> sqlite3 *dst = NULL;
> sqlite3_backup *backup;
> int rc;
> int ok = 0;
>
> if (argc != 3) {
> fputs("I want two file names: source, destination\n",
> stderr);
> goto error;
> }
>
> rc = sqlite3_open_v2(argv[1], , SQLITE_OPEN_READONLY, NULL);
> if (rc != SQLITE_OK) {
> fprintf(stderr, "Cannot open %s: %s\n", argv[1],
> sqlite3_errmsg(src));
> goto error;
> }
>
> rc = sqlite3_open_v2(argv[2], , SQLITE_OPEN_READWRITE |
> SQLITE_OPEN_CREATE, NULL);
> if (rc != SQLITE_OK) {
> fprintf(stderr, "Cannot open %s: %s\n", argv[2],
> sqlite3_errmsg(dst));
> goto error;
> }
>
> sqlite3_exec(src, "PRAGMA busy_timeout=1", NULL, NULL, NULL);
> sqlite3_exec(dst, "PRAGMA busy_timeout=1", NULL, NULL, NULL);
>
> backup = sqlite3_backup_init(dst, "main", src, "main");
> if (backup == NULL) {
> fprintf(stderr, "Cannot initialize backup: %s\n",
> sqlite3_errmsg(dst));
> goto error;
> }
>
> do {
> rc = sqlite3_backup_step(backup, -1);
> } while (rc == SQLITE_BUSY || rc == SQLITE_LOCKED);
>
> rc = sqlite3_backup_finish(backup);
> if (rc == SQLITE_OK) {
> ok = 1;
> } else {
> fprintf(stderr, "Backup failed: %s\n",
> sqlite3_errmsg(dst));
> }
>
> error:
> sqlite3_close(dst);
> sqlite3_close(src);
>
> return ok ? 0 : 1;
> }
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to backup a "live" database

2016-10-04 Thread Eric Grange

> Have you looked at the ext/session extension ?

Yes, but it is a bit more complicated to integrate, and would impose a
penalty during execution as far as I understand: there are quite a few
intermediate states that would be stored by a changeset, but that do not
really need to be preserved at the end of the day.

Those intermediate states could be moved to a secondary DB, but would then
lose foreign keys and other integrity constraints

This is however an extension that I have been looking at, should the need
for some form of "live" db replication occur.

> I'm sorry.  I misunderstood your question and thought you were just
duplicating the file using OS calls.  The SQLite backup API takes care of
all necessary locking and > consistency problems for you. You should be
fine.

Ok, thanks!

> The problem of backing up a changing database is one of the difficult
problems in database management.  Please don't expect an easy solution.

Right now I am tackling it with brute force: a backup usually takes about
4-5 minutes (to a secondary SSD, that offline copy is then uploaded to a
rempte server).
It is mostly the odd occurences when the backup takes several hours that
are problematic.

If all else fail, I could also suspend DB writes during backups (suspending
DB reads would be more problematic).

> Use the backup API, and copy everything in one step.
> (The restart-on-write feature should not be necessary with WAL.)

That was what I thought initially, but I can only explain the multi-hours
backups with it: usually the backup API takes 4-5 minutes. It is just once
in a while that a very long backup occurs.

> It calls sqlite3_backup_step() with a size of 100 pages.

Ok, so I guess the huge cache is overkill with the default CLI!

Thanks

On Tue, Oct 4, 2016 at 4:05 PM, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 4 Oct 2016, at 2:53pm, Eric Grange <zar...@gmail.com> wrote:
>
> > I am going on the assumption that if something fails during backup, the
> > backup itself will be toast anyway, but is that safe otherwise?
>
> No.  You have no locking.  You might copy the beginning of the file before
> a transaction and the end of the file after it, meaning that pointers at
> one part of the file point to things which no longer exist.
>
> The problem of backing up a changing database is one of the difficult
> problems in database management.  Please don't expect an easy solution.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Fastest way to backup a "live" database

2016-10-04 Thread Eric Grange

Hi,

Given a fairly large database (dozens of gigabytes), which uses WAL, and is
being accessed continuously (mostly read transactions, but regular write
transactions as well), what are the fastest and less disruptive ways to
back it up?

A basic ".backup" from the CLI can occasionnally take hours, as it is
thrown thrown off whenever large write transactions occur.

I have found the following approaches to seem to work, but would like a
confirmation:
- using temp_store = MEMORY
- using synchronous = 0
- using a cache_size as large a possible (as high as possible without
running out of memory)

I am going on the assumption that if something fails during backup, the
backup itself will be toast anyway, but is that safe otherwise?

Also the precompiled CLI for Windows (sqlite3.exe) of 3.14.2 fails at
around 2 GB RAM, which in my case is about 500,000 pages, is that already
too high or could it be worth going with a 64bits CLI?

Thanks!

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Ideas for optional json fields?

2016-09-26 Thread Eric Grange

Nice solution and works like a charm, thanks Clemens!

On Mon, Sep 26, 2016 at 4:22 PM, Clemens Ladisch <clem...@ladisch.de> wrote:

> Eric Grange wrote:
> > I am generating a json with json_object() function, and I would like to
> > omit the null or default fields from the output json
>
> SELECT (SELECT json_group_object(key, value)
> FROM json_each(json_object('field1', field1, 'field2', field2))
> WHERE value IS NOT NULL)
> FROM MyTable;
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Ideas for optional json fields?

2016-09-26 Thread Eric Grange

Using coalesce on the value produces the field with the JSON, with an empty
string (or whatever it was coalesced again).

What I really would like to get is {"field1":"value"} rather than
{"field1":"value","field2":null} or {"field1":"value","field2":""}

I also tried setting the field name to null, but that triggers an error
"json_object() labels must be TEXT"

On Mon, Sep 26, 2016 at 2:28 PM, R Smith <rsm...@rsweb.co.za> wrote:

>
> On 2016/09/26 11:15 AM, Eric Grange wrote:
>
>> Hi,
>>
>> I am generating a json with json_object() function, and I would like to
>> omit the null or default fields from the output json (for compacity
>> reasons)
>>
>> When there is only one such field, I have used a "case when" like in
>>
>>
>> select
>> case when field2 is null then
>>json_object('field1', field1)
>> else
>>json_object('field1', field1, 'field2', field2)
>> end
>> ...
>>
>>
>> but when there are multiple such optional fields, this approach becomes
>> unwieldy.
>>
>
> I'm not experienced playing with the JSON extension much, but the solution
> that feels correct to me would be:
>
> select json_object('field1', COALESCE(field1,'""'), 'field2',
> COALESCE(field2,'""'))
>
> or some such variant that produces the correct output...
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Ideas for optional json fields?

2016-09-26 Thread Eric Grange

Hi,

I am generating a json with json_object() function, and I would like to
omit the null or default fields from the output json (for compacity reasons)

When there is only one such field, I have used a "case when" like in


select
   case when field2 is null then
  json_object('field1', field1)
   else
  json_object('field1', field1, 'field2', field2)
   end
...


but when there are multiple such optional fields, this approach becomes
unwieldy.

Anyone has a better idea?

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] "Responsive" website revamp at www.sqlite.org

2016-09-06 Thread Eric Grange

Well, I guess I am in the minority using a tabbed browser then :)

In tabbed browsers, all browser tabs share the same window width, and it is
quite impractical to resize the browser every time you switch a tab.

On Tue, Sep 6, 2016 at 11:06 AM, R Smith <rsm...@rsweb.co.za> wrote:

>
>
> On 2016/09/06 10:19 AM, Stephan Beal wrote:
>
>> On Tue, Sep 6, 2016 at 10:15 AM, Eric Grange <zar...@gmail.com> wrote:
>>
>> However, while you are at it, an improvement for the website when browsed
>>> from a computer would be to limit the max width of the pages, especially
>>> for the documentation pages. Currently if you have a big screen the lines
>>> of text stretch to the whole browser width, which is not very readable.
>>>
>>> A counter-opinion, though apparently in the small minority: i _absolutely
>> despise_ fixed-width web site layouts.
>>
>>
> I'm afraid I'm with Mr. Beal on this - Windows (for all its faults) lets
> me drag the size of a window (including browser windows) to whatever I like
> them to be. It feels extremely presumptuous of a web-site (or its designer)
> to decide that I am incapable of picking a size that "reads well". I think
> the convention has merit for people who simply clicks the little "Maximize"
> button and then expects to see content that is easy to read. That's
> laziness in my opinion, but I'm not judging, just don't want them to decide
> my read-width.
>
> Catering for user who don't know how to use computers is a bit of an Apple
> thing (and sadly lately a bit of a Windows thing too), luckily Linux still
> thinks I am the boss. I don't mind that these conventions exist to help
> those folks, I just wish I had the choice... always.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] "Responsive" website revamp at www.sqlite.org

2016-09-06 Thread Eric Grange

> A counter-opinion, though apparently in the small minority: i _absolutely
> despise_ fixed-width web site layouts.

Just to clarify, this is not fixed width, but limited max width, ie. it
only kicks in when the browser window is very large. The site is fluid
(like now) at smaller widths.

On Tue, Sep 6, 2016 at 10:19 AM, Stephan Beal <sgb...@googlemail.com> wrote:

> On Tue, Sep 6, 2016 at 10:15 AM, Eric Grange <zar...@gmail.com> wrote:
>
> > However, while you are at it, an improvement for the website when browsed
> > from a computer would be to limit the max width of the pages, especially
> > for the documentation pages. Currently if you have a big screen the lines
> > of text stretch to the whole browser width, which is not very readable.
> >
>
> A counter-opinion, though apparently in the small minority: i _absolutely
> despise_ fixed-width web site layouts.
>
> --
> - stephan beal
> http://wanderinghorse.net/home/stephan/
> "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
> those who insist on a perfect world, freedom will have to do." -- Bigby
> Wolf
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] "Responsive" website revamp at www.sqlite.org

2016-09-06 Thread Eric Grange

Seems to be fine from here, and probably not too critical, as not all
websites have a purpose on mobile devices.

However, while you are at it, an improvement for the website when browsed
from a computer would be to limit the max width of the pages, especially
for the documentation pages. Currently if you have a big screen the lines
of text stretch to the whole browser width, which is not very readable.

For instance right now you have something like
   http://imgur.com/a/nRfno
IMHO it would be more readable as
   http://imgur.com/a/EkKZA

Above I just added "max-width:1100px" and "margin:auto" to the body style.

And for those that do not like the extra white, it can be skinned to gray
as in http://imgur.com/a/1tRMR with just the following CSS (and hopefully
there are some that could contribute even better skins)

html {
background: #eee;
}
body {
background: #fff;
padding: 0 2em;
box-shadow: 0 0 .3em #888;
margin-top: .5em;
}

Eric

On Tue, Sep 6, 2016 at 2:32 AM, Stephen Chrzanowski 
wrote:

> I have my ancient Galaxy S (Original
> ) Android hanging
> here on my desk, and I pulled up the draft page, and it looks good.
>
> With respect, Dr. Hipp, I really do hope you weren't REALLY pressured into
> this (And the analogy between the soda-straw compared against my desktops
> 5-monitor setup is the absolute truth), and this is just more of a side
> project or an itch to scratch to give the site a refresh.  If so, good
> work, for the *MANY* people who use sqlite.org as a reference point, and I
> guess it'll be worth while for the 5 people who use their phones to look up
> information rather than use a desktop computer they do their development
> on. ;)
>
> On Mon, Sep 5, 2016 at 4:55 PM, Richard Hipp  wrote:
>
> > Most of the world views the internet on their phone now, I am told,
> > and websites are suppose to be "responsive", meaning that they
> > reformat themselves to be attractive and useful for the majority who
> > view them through a 320x480 pixel soda-straw.  In an effort to conform
> > to this trend, I have made some changes to the *draft* SQLite website
> > (http://sqlite.org/draft) Your feedback on these changes is
> > appreciated.  Please be sure to try out the new design both on a
> > narrow-screen phone and on a traditional desktop browser.  The goal is
> > to provide a more mobile-friendly website without reducing the
> > information content available to desktop users.
> >
> > --
> > D. Richard Hipp
> > d...@sqlite.org
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to add many simple rows to a table?

2016-05-31 Thread Eric Grange

> I tried your example and got ~0.45s for json and ~0.37s for my
> implementation;

I have also tried a custom virtual table, and I also get figures about
20-30% faster than with json.

> Interesting: when NOT using "primary key" (I have a flag for this), the
> times are ~ 0.26s vs 0.18s vs 0.47s !

Interesting indeed, I had (incorrectly) assumed that providing the rowid
would help with performance.

> I agree json works fast enough for such simple example, but I don't know
> how you could use it for e.g. 2 fields;

There are two approaches I found that work with varying degrees of
efficiency depending on the data (how many columns, how large...):

* concat the row data into a string, pass those as a json array, then in
sqlite, use json_each, instr and substr and cast to parse back to columns

* stream the rows as a json array of json objects, and parse in sqlite with
json_each and json_extract, using

   select json_extract(value, '$.a') a, json_extract(value, '$.b') b
   from (select value from json_each('[{"a":1,"b":2},{"a":3,"b":3}]'))

or

   select json_extract(value, '$.a') a, json_extract(value, '$.b') b
   from json_each('[{"a":1,"b":2},{"a":3,"b":3}]')

from "a mile high", both forms had a similar performance, but I have not
looked at the vdbe nor at the bowels of json_extract/each.

I will likely got with the custom virtual table in my particular case, but
being able to insert straight from json is something I will definitely use
knowing it's faster than doing it with code (all APIs that revolve around
running queries on input data would be a perfect fit).

Eric



On Tue, May 31, 2016 at 3:32 PM, Gabriel Corneanu  wrote:

> Hi Eric,
>
> As I know you from Delphi related projects, I thought it would be nice to
> share my ideas.
> I am using a home-grown framework for sqlite persistence, optimized for
> storing records into tables.
> Main ideas:
> - using virtual tables (with the same structure) to speed up sqlite inserts
> a write translates to: insert into test("ID") select "ID" from
> temp.tmp_test
>
> - based on RTTI to generate optimized code for accessing/returning data
> - supports all common data (ints, floats, string, blob/TBytes)
>
> I tried your example and got ~0.45s for json and ~0.37s for my
> implementation; I have also tried using simple recursive cte (see code
> below) but it was slower, ~0.66s.
> I think most of the time is spent in packing the integers and building the
> btree(s) (i.e. CPU bound). Enabling the "name" field below ("storage"
> attribute) increased the processing time (my implementation only) to ~0.47,
> which I think confirms my interpretation.
>
> Interesting: when NOT using "primary key" (I have a flag for this), the
> times are ~ 0.26s vs 0.18s vs 0.47s !
> Looks like sqlite is optimized to generate the rowid, and it is slower when
> explicitly set (even if it's the same value).
>
> I agree json works fast enough for such simple example, but I don't know
> how you could use it for e.g. 2 fields;
>
>
> Hope it helps,
> Gabriel
>
> Code:
>
> type
>   TTestRecord = record
> [Storage('ID', [sfPrimaryKey])]
> //[Storage('ID')]
> ID : Int32;
> //[Storage('name')]
> name: string;
>   end;
>
> procedure TForm2.Button2Click(Sender: TObject);
> var
>   db : TSQLiteDatabase;
>   tb : TVirtualTableDataArray;
>   r : TTestRecord;
>   i : integer;
>   pf, t,t1,t2,t3: int64;
>   json: TStringBuilder;
> begin
>   tb := TVirtualTableDataArray.Create;
>   json := TStringBuilder.Create;
>   db := TSQLiteDatabase.Create;
>   db.OpenWrite(':memory:');
>
>   tb.Setup(db.DB.DB, 'test');
>   tb.CreateTable;
>   tb.VTInit;
>
>   json.Append('[');
>   for i := 1 to 100 do
>   begin
> r.ID := i;
> r.name := i.ToString;
> tb.Items.Add(r);
> if i > 1 then
>   json.Append(',');
> json.Append(i);
>   end;
>   json.Append(']');
>
>   QueryPerformanceCounter(t);
>   db.db.ExecSQL('insert into test(id) select value from json_each(?)',
> [json.ToString]);
>   QueryPerformanceCounter(t1);
>   t1 := t1-t;
>
>   tb.CreateTable;
>   QueryPerformanceCounter(t);
>   tb.WriteData;
>   QueryPerformanceCounter(t2);
>   t2 := t2-t;
>
>   tb.CreateTable;
>   QueryPerformanceCounter(t);
>   db.db.ExecSQL('WITH RECURSIVE ' +
> 'cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE
> x<'+tb.Count.ToString+
> ') insert into test(id) select x from cnt');
>   QueryPerformanceCounter(t3);
>   t3 := t3-t;
>
>
>   QueryPerformanceFrequency(pf);
>   ShowMessageFmt('N=%d  T1=%.2fs  T2=%0.2fs T3=%.2fs', [tb.Count, t1/pf,
> t2/pf, t3/pf]);
>
>   tb.Free;
>   db.Free;
>   json.Free;
> end;
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Fastest way to add many simple rows to a table?

2016-05-27 Thread Eric Grange

@Keith yes, a custom table-valued function should be faster, as it would
alleviate the roundtrip through json, though that json round-tripping
accounts to only about 10-20% of the CPU time AFAICT.

@Richard and others: executing individual insert statements always give
lower performance than using json_each, it is more than 2 times slower when
many rows are involved.

For instance in the case of a ":memory:" database, with the table

   create table test (id INTEGER PRIMARY KEY)


and inserting in it numbers from 1 to 100 (one million), within a
transaction and with 3.13.0

When using insert into prepared statements, the inserts take 1.95 seconds
here, but if using json_each, only 0.74 seconds (including the creation of
the JSON, which itself is a matter of milliseconds).

Code looks like the following (the methods are ultra-thin wrappers around
SQLite API calls)


   db := TSQLiteDatabase.Open(':memory:');
   db.ExecSQL('create table test (id INTEGER PRIMARY KEY)');
   QueryPerformanceCounter(t1);
   db.BeginTransaction;
  stmt := db.CreateStatement('insert into test values (?)');
  for i := 1 to NB do begin
 stmt.BindInteger(1, i);
 stmt.Step;
 stmt.Reset;
  end;
  stmt.Finalize;
   db.Commit;
   QueryPerformanceCounter(t2);


and for the json_each


   db := TSQLiteDatabase.Open(':memory:');
   db.ExecSQL('create table test (id INTEGER PRIMARY KEY)');
   QueryPerformanceCounter(t1);
   db.BeginTransaction;
  stmt := db.CreateStatement('insert into test select value from
json_each(?)');
  stmt.BindStringA(1, json.ToUTF8String);
  stmt.Step;
  stmt.Reset;
  stmt.Finalize;
   db.Commit;
   QueryPerformanceCounter(t2);


Note that loop itself is not the bottleneck: when commenting out Step/Reset
calls and leaving just 1 million BindInteger calls, it runs in 44 ms (which
is roughly comparable to the time it takes to create the json string)

In my few tests, passing the data as json and using json1 functions was
consistently faster at least up to 3 columns and when using integers or
small strings.

So "insert from select" seems to have some avantage, and I was wondering if
there are other ways to leverage it?




On Fri, May 27, 2016 at 12:29 AM, Dominique Pellé <dominique.pe...@gmail.com
> wrote:

> Richard Hipp <d...@sqlite.org> wrote:
>
> > On 5/26/16, Eric Grange <zar...@gmail.com> wrote:
> >>
> >> I am looking for the fastest way to insert many rows to a simple table.
> >
> > (1) Create a prepared statement:  "INSERT INTO xyzzy VALUES(?1,?2,...)"
> > (2) Run "BEGIN"
> > (3) Loop over each row you want to insert, bind values to the prepared
> > statement, then call sqlite3_step() and sqlite3_reset().
> > (4) Run "COMMIT"
> > (5) Call sqlite3_finalize() on the prepared statement to avoid a memory
> leak.
>
>
> Additionally, "PRAGMA synchronous=off;" gives significant
> speed up for insertions, if you don't mind a corrupted database
> in case of system crash. See:
>
> https://www.sqlite.org/pragma.html#pragma_synchronous
>
> Dominique
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Fastest way to add many simple rows to a table?

2016-05-26 Thread Eric Grange

Hi,

I am looking for the fastest way to insert many rows to a simple table.

By "simple" I mean a relation table (with just a couple integer/key fields)
or even a single-column (temp table used for filtering as an alternative to
a "in" literal), but I want to insert hundreds of thousandths of them as
fast as possible.

The fastest I have found so far is to use an insert from a select with
json_each() used to provide the data.
Using multiple "insert into", even within a prepared statement within a
transaction is quite slower.

The json_each approach would satisfy my need, but I am curious is there is
another approach that could be suggested? (without involving a custom
function)

Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Are BLOB stored inline with the rest of the record fields?

2016-03-04 Thread Eric Grange

Thanks for all the details!

I was already storing the really large blobs in their own tables.

However I will have to review the situations for smallish blobs (about as
large as the rest of the fields in a record), as moving them to another
table would halve the number of pages involved for queries that do not need
them... of course that would have to be balanced with more complex sql, and
more page reads in the cases where those blobs are needed... why can't we
both have our cake and eat it? :)

On Thu, Mar 3, 2016 at 5:40 PM, Drago, William @ CSG - NARDA-MITEQ <
William.Drago at l-3com.com> wrote:

> I haven't seen any anyone else mention this yet. Another consideration
> when working with large blobs is to make those columns that last ones in
> your schema. To quote Dr. Hipp:
>
> "make the BLOB columns the last column in your tables.  Or even store the
> BLOBs in a separate table which only has two columns: an integer primary
> key and the blob itself, and then access the BLOB content using a join if
> you need to. If you put various small integer fields after the BLOB, then
> SQLite has to scan through the entire BLOB content (following the linked
> list of disk pages) to get to the integer fields at the end, and that
> definitely can slow you down."
>
> Found here:
>
>
> http://sqlite.1065341.n5.nabble.com/Effect-of-blobs-on-performance-td19559.html#a19560
>
> Your blobs are small so this probably doesn't apply to your application,
> but something to keep in mind for future projects.
>
> --
> Bill Drago
> Staff Engineer
> L3 Narda-MITEQ
> 435 Moreland Road
> Hauppauge, NY 11788
> 631-272-5947 / William.Drago at L-3COM.com
>
>
> > -Original Message-
> > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> > bounces at mailinglists.sqlite.org] On Behalf Of Eric Grange
> > Sent: Thursday, March 03, 2016 3:16 AM
> > To: SQLite mailing list
> > Subject: Re: [sqlite] Are BLOB stored inline with the rest of the record
> fields?
> >
> > > All BLOBs are stored inline.  The only complication comes when the
> > > total
> > row length is longer than a page.
> >
> > Ok thanks!
> >
> > That is besides my original question but what is the complication in
> that case?
> > Is the blob partially stored in the row, or is some indirection to other
> pages
> > used, with only some pointer stored in the row?
> >
> > > Surely you mean big-endian?
> >
> > Yes, my mistake :)
> >
> > > Storing Blobs has a few conversions added if you try to use the SQL to
> > store it, but it's quite efficient when using the API.
> >
> > By "use SQL", you mean as a literal blob embedded in the SQL string?
> >
> > My main use case would be through the API, the actual numbers are fixed
> > precision, and so scattered over the range they cannot be displayed to
> end
> > users without using exponents, so some formatting will have to happen.
> > Besides size, using a blob rather than base32/base64 would simplify the
> > encoding/decoding, and for debugging, blobs usually display in an
> > hexadecimal form here, so a big-endian blob would be directly "readable".
> >
> >
> >
> > On Wed, Mar 2, 2016 at 11:07 PM, Doug Currie 
> > wrote:
> >
> > > On Wed, Mar 2, 2016 at 4:42 PM, R Smith  wrote:
> > > >
> > > >
> > > > Personally, unless your space is constrained, I would simply save
> > > > the numbers as strings, perhaps Hex or BCD with leading chars and
> > > > convert as needed. This would sort correctly without tricks and not
> > > > do much worse
> > > for
> > > > space. (Base64 would be even better space-wise but won't sort
> correct).
> > > >
> > >
> > > There is an encoding defined in RFC 4648 "Base-N Encodings" that does
> > > preserve sort order; it is called Base 32 Encoding with Extended Hex
> > > Alphabet. I would think the Base64 alphabet could be rearranged to
> > > have the same property.
> > >
> > > e
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users at mailinglists.sqlite.org
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and any

[sqlite] Are BLOB stored inline with the rest of the record fields?

2016-03-03 Thread Eric Grange

> All BLOBs are stored inline.  The only complication comes when the total
row length is longer than a page.

Ok thanks!

That is besides my original question but what is the complication in that
case?
Is the blob partially stored in the row, or is some indirection to other
pages used, with only some pointer stored in the row?

> Surely you mean big-endian?

Yes, my mistake :)

> Storing Blobs has a few conversions added if you try to use the SQL to
store it, but it's quite efficient when using the API.

By "use SQL", you mean as a literal blob embedded in the SQL string?

My main use case would be through the API, the actual numbers are fixed
precision, and so scattered over the range they cannot be displayed to end
users without using exponents, so some formatting will have to happen.
Besides size, using a blob rather than base32/base64 would simplify the
encoding/decoding, and for debugging, blobs usually display in an
hexadecimal form here, so a big-endian blob would be directly "readable".

On Wed, Mar 2, 2016 at 11:07 PM, Doug Currie  wrote:

> On Wed, Mar 2, 2016 at 4:42 PM, R Smith  wrote:
> >
> >
> > Personally, unless your space is constrained, I would simply save the
> > numbers as strings, perhaps Hex or BCD with leading chars and convert as
> > needed. This would sort correctly without tricks and not do much worse
> for
> > space. (Base64 would be even better space-wise but won't sort correct).
> >
>
> There is an encoding defined in RFC 4648 "Base-N Encodings" that does
> preserve sort order; it is called Base 32 Encoding with Extended Hex
> Alphabet. I would think the Base64 alphabet could be rearranged to have the
> same property.
>
> e
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Are BLOB stored inline with the rest of the record fields?

2016-03-02 Thread Eric Grange

Hi,

I am wondering if very small BLOBs are stored inline with the record fields
(in the same page) or in another page?

So far I had not worried about it, but now I have a usage case where I need
to store 128 and 256 bit integers (numbers, not IDs or hashes), SQLite does
not support such large integers natively, so I am planning to store them as
little-endian binary blobs, so that the BINARY collation can work for
sorting, and for the rest I would do the processing with custom functions
or plain old code.

However I will have quite a lot of those numbers, and would like to know if
they would be stored inline with the rest of the records (would be
preferable for db size), or with an indirection in another page of the db
(so larger db), and if stored in another page, if there is an alternative,
such as passing binary as "utf8 text"and forcing a COLLATE BINARY on the
column?

(yes, I could test that or use the source, but I would also like to have
some "designer feedback", as what current SQLite version does may not be
what designers intend for future SQLite versions)

Thanks!

[sqlite] Performance comparison between SQLite and SQL Server?

2016-02-16 Thread Eric Grange

IME when SQLite is applicable (mostly the one writer limit), it usually
runs circles around to server DBs, provided:
- you have comparable CPU
- you have enough cache memory, or free RAM for OS disk caching, to be in a
comparable memory scenario
- you do not have a virtualisation layer, especially compared against a
server DB running on bare-metal, as that can make a huge difference in
effective IOPS

Also the ability of SQLite to smoothly work with temporary tables can make
a world of difference when faced with complex queries. That is not
something that will show in 1:1 benchmarks, but when a complex query can be
broken down into several simpler queries + temporary tables, well, you can
save lots on both execution and debugging times.


On Tue, Feb 16, 2016 at 3:02 PM, olivier vidal 
wrote:

>
> obviously it will never be perfect . We already know the benefits of each
> database , including those of SQLITE . This is not because there is a
> benchmark that people are not going to look at all the other features. But
> in equal hardware, it gives a small indication of current performance, as
> sqlite.org had done a long time ago.
>
> Simon Slavin a ?crit :
>
>> On what hardware ?  Should the SQLite test be on a laptop because SQLite
>> will run on a laptop ?  Or should you test both on identical hardware even
>> though they're unlikely to run on identical hardware ?  Should you include
>> the time taken to install and configure PostgreSQL in the times quoted
>> because it takes no time to install or configure SQLite ?
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] json1 not escaping CRLF characters

2016-02-04 Thread Eric Grange

Thanks!

I can make do with replace() for now (as in my particular case, I only have
CRLF & TAB to replace)

On Thu, Feb 4, 2016 at 11:39 AM, Richard Hipp  wrote:

> On 2/4/16, Richard Hipp  wrote:
> > On 2/4/16, Eric Grange  wrote:
> >> Hello,
> >>
> >> I have been experimenting with using the json1 extension to generate
> json
> >> sql-side rather than code-side, but I am hitting an issue with the CR &
> >> LF
> >> characters which are not escaped, which results in json that fails
> >> JSON.parse or use as JSONP in browsers. Ideally CR / LF should be
> encoded
> >> as \r and \n respectively.
> >>
> >> Is it by design or an oversight?
> >
> > Oversight.  https://www.sqlite.org/src/tktview/ad2559db380a
> >
>
> Now fixed on trunk.  Snapshot available at https://www.sqlite.org/
> and the change is mentioned on the draft change log for the next
> release at https://www.sqlite.org/draft/releaselog/3_11_0.html
>
> --
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] json1 not escaping CRLF characters

2016-02-04 Thread Eric Grange

Hello,

I have been experimenting with using the json1 extension to generate json
sql-side rather than code-side, but I am hitting an issue with the CR & LF
characters which are not escaped, which results in json that fails
JSON.parse or use as JSONP in browsers. Ideally CR / LF should be encoded
as \r and \n respectively.

Is it by design or an oversight?

Is there a way around using the replace() function to fix it? (though I
guess other control characters will have the issue, so multiple replace()
would be needed)

Eric

[sqlite] Backing up SQLite DB with unbuffered I/O

2015-10-29 Thread Eric Grange

Is there a way to perform an online backup with unbuffered I/O?

I am currently performing backups with the command-line tool and ".backup",
but this appears to wreak havoc on the OS disk cache.

I am able to mitigate this by disabling buffering on the destination drive,
but that cannot be used for the source drive.

Eric

[sqlite] SQLite not using primary key index anymore

2015-05-19 Thread Eric Grange

> The SQLite I have here uses the primary key.

It works now after another round of drop index/create index/analyze.

The problem is/was related to the DB, I have multiple databases with the
same schema, only the largest one had the issue (38 GB), the smaller ones
did not.

> Which version?

While it happened I tested on 3.8.10.1 and 3.8.8.1

The DB is using WAL mode, and there were several connections to it, I
suppose this might have interfered with the index or stats?





On Tue, May 19, 2015 at 10:50 AM, Clemens Ladisch 
wrote:

> Eric Grange wrote:
> > I recently added a field and index to an existing table, and now SQLite
> > seems to be using that index in place of the primary key, even on simple
> > queries
> >
> > CREATE TABLE vin (
> >id INTEGER PRIMARY KEY AUTOINCREMENT,
> >tx_id INTEGER NOT NULL,
> >from_vout_id INTEGER,
> >addr_id INTEGER REFERENCES addresses (id),  -- newly added field is
> this one
> >FOREIGN KEY(tx_id) REFERENCES transactions(id),
> >FOREIGN KEY(from_vout_id) REFERENCES vout(id)
> > )
> >
> > CREATE INDEX vin_addr_idx on vin (addr_id)
> >
> > And the offending query is
> >
> > select id from vin order by id desc limit 1
> >
> > for which the query plan is
> >
> > 0 0 0 SCAN TABLE vin USING INDEX vin_addr_idx
> > 0 0 0 USE TEMP B-TREE FOR ORDER BY
>
> The SQLite I have here uses the primary key.
>
> > This is with the precompiled sqlite.dll.
>
> Which version?
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] SQLite not using primary key index anymore

2015-05-19 Thread Eric Grange

Also while using

select id from vin NOT INDEXED order by id desc limit 1


works around the problem, this is just one query of many (and a simplified
one at that), and I have other less trivial queries where the inefficient
default query plan has the same catastrophic effect on performance, so this
would not be too practical.





On Tue, May 19, 2015 at 10:15 AM, Eric Grange  wrote:

> Hi,
>
> I recently added a field and index to an existing table, and now SQLite
> seems to be using that index in place of the primary key, even on simple
> queries
>
> Table declaration:
>
> CREATE TABLE vin (
>id INTEGER PRIMARY KEY AUTOINCREMENT,
>tx_id INTEGER NOT NULL,
>from_vout_id INTEGER,
>addr_id INTEGER REFERENCES addresses (id),  -- newly added field is
> this one
>FOREIGN KEY(tx_id) REFERENCES transactions(id),
>FOREIGN KEY(from_vout_id) REFERENCES vout(id)
> )
>
>
> The new index in question is
>
> CREATE INDEX vin_addr_idx on vin (addr_id)
>
>
> And the offending query is
>
> select id from vin order by id desc limit 1
>
>
> for which the query plan is
>
> 0 0 0 SCAN TABLE vin USING INDEX vin_addr_idx
> 0 0 0 USE TEMP B-TREE FOR ORDER BY
>
>
> As the table has about 175 millions of rows, this ends up very badly,
> taking dozens of seconds rather than a few microseconds as before.
> (fwiw when the order is ascending, the query plan is the same)
>
> Previously it was doing a simple "SCAN TABLE vin", and was instant.
> There are a couple other indexes on the same table for tx_id &
> from_vout_id, but these did not ever cause that issue.
>
> I have run "analyze" and the sqlite_stat1 table contains for the index
>
> vin vin_addr_idx 175799408 220
>
>
> Even when deleting that line from the sqlite_stat1, the query plan does
> not change.
>
> When dropping the vin_addr_idx index, the query plan reverts to the fast
> one, and the query is instant again.
> When re-creating the index, the issue reappears.
>
> I did a pragma integrity_check, and it came out clean. This is with the
> precompiled sqlite.dll.
>
> Any ideas?
>
> Eric
>
>
>
>

[sqlite] SQLite not using primary key index anymore

2015-05-19 Thread Eric Grange

Hi,

I recently added a field and index to an existing table, and now SQLite
seems to be using that index in place of the primary key, even on simple
queries

Table declaration:

CREATE TABLE vin (
   id INTEGER PRIMARY KEY AUTOINCREMENT,
   tx_id INTEGER NOT NULL,
   from_vout_id INTEGER,
   addr_id INTEGER REFERENCES addresses (id),  -- newly added field is this
one
   FOREIGN KEY(tx_id) REFERENCES transactions(id),
   FOREIGN KEY(from_vout_id) REFERENCES vout(id)
)


The new index in question is

CREATE INDEX vin_addr_idx on vin (addr_id)


And the offending query is

select id from vin order by id desc limit 1


for which the query plan is

0 0 0 SCAN TABLE vin USING INDEX vin_addr_idx
0 0 0 USE TEMP B-TREE FOR ORDER BY


As the table has about 175 millions of rows, this ends up very badly,
taking dozens of seconds rather than a few microseconds as before.
(fwiw when the order is ascending, the query plan is the same)

Previously it was doing a simple "SCAN TABLE vin", and was instant.
There are a couple other indexes on the same table for tx_id &
from_vout_id, but these did not ever cause that issue.

I have run "analyze" and the sqlite_stat1 table contains for the index

vin vin_addr_idx 175799408 220


Even when deleting that line from the sqlite_stat1, the query plan does not
change.

When dropping the vin_addr_idx index, the query plan reverts to the fast
one, and the query is instant again.
When re-creating the index, the issue reappears.

I did a pragma integrity_check, and it came out clean. This is with the
precompiled sqlite.dll.

Any ideas?

Eric

[sqlite] Multi-table index ersatz?

2015-03-04 Thread Eric Grange

> Rowids will be faster than primary keys.

My primary keys are ROWIDs ("INTEGER PRIMARY KEY" actually)

None of the index was exploited for the order by, and the matched records
in table A are scattered in pages all over the database, so ordering them
in memory has a tendency to "replace" the whole SQLite cache: first time a
query is run, it's slow, second time, it's fast, but if you change the
condition value (?1) then it's slow again as the page cache is invalidated
(it is very visible in the resource monitor, with a disk access spike)

> You might be able to make the new table a WITHOUT ROWID table and set its
PRIMARY KEY up with the same (or a superset of the) fields of your "fairly
large index" in order to save a bit of space.

I have been experimenting that way, and actually since A1 and B1 should fit
32bits integers for the foreseeable future, combining them into a 64bit
integer is possible, and I use (A1 << 32) | B1  as "INTEGER PRIMARY KEY"
(ROWID). This makes a separate composite index unnecessary as the primary
key becomes the composite index: the equality condition becomes a range
condition on the rowid, with an order by on the rowid, both being fast and
cache-friendly.

It reduces disk usage significantly over the previous full-blown C table +
composite index, it is still a sort of manually-managed hacky index, which
involves extra queries to maintain it. But at the moment it seems to be the
"better" solution.

> It might be possible to write a virtual table module that does the same
> as your index on C, but with C being a view.

I had a look that way, but AFAICT virtual tables do not support indexing,
so I would have to index manually.


On Tue, Mar 3, 2015 at 4:57 PM, Dan Kennedy  wrote:

> On 03/03/2015 06:10 PM, Eric Grange wrote:
>
>> Hi,
>>
>> I have problem where I need a "multi-table index" ersatz, or maybe a
>> better
>> data structure :-)
>>
>> The problem is as follow:
>>
>> - Table A : some fields plus fields A1 & A2
>> - Table B : some fields plus fields B1 & B2
>>
>> Both tables have several dozen millions of rows, and both are accessed
>> independently of each others by some queries, their current structure has
>> no performance issues for those queries.
>>
>> However I have a new query which is like
>>
>> select ...some fields of A & B...
>> from A join B on A.A2 = B.B2
>> where A.A1 = ?1
>> order by B.B1
>> limit 100
>>
>>
>> Without the limit, there can be tens of thousandths resulting rows,
>> without
>> the A1 condition, there can be millions of resulting rows.
>>
>> With indexes on A & B, the performance of the above is not very good, as
>> indexing A1 is not enough, and indexing B1 is not enough either, so no
>> query plan is satisfying.
>>
>> I can make the query instantaneous by duplicating the A1 & B1 fields in a
>> dedicated C table (along with the primary keys of A & B), index that
>> table,
>> and then join back the A & B table to get the other fields.
>>
>> However this results in a fairly large table of duplicated data, whose
>> sole
>> purpose is to allow the creation of a fairly large index, which gets me
>> the
>> performance.
>>
>
> You might be able to make the new table a WITHOUT ROWID table and set its
> PRIMARY KEY up with the same (or a superset of the) fields of your "fairly
> large index" in order to save a bit of space.
>
>
>
>
>
>> Note that if the fields A1 & B1 are removed from their tables and kept
>> only
>> in C, this has massive performance implication on other queries running
>> only against A & B, as those fields are leveraged in other composite
>> indexes.
>>
>> Is there a better way that would not involve duplicating the data?
>>
>> Eric
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Multi-table index ersatz?

2015-03-03 Thread Eric Grange

Yes A2 & B2 are already indexed (individually and in composite indexes)
The problem is that this indexing is not selective enough when taken in
isolation.
Le 3 mars 2015 12:36, "Simon Davies"  a ?crit
:

> On 3 March 2015 at 11:10, Eric Grange  wrote:
> >
> > Hi,
> >
> > I have problem where I need a "multi-table index" ersatz, or maybe a
> better
> > data structure :-)
> >
> > The problem is as follow:
> >
> >- Table A : some fields plus fields A1 & A2
> >- Table B : some fields plus fields B1 & B2
> >
> > Both tables have several dozen millions of rows, and both are accessed
> > independently of each others by some queries, their current structure has
> > no performance issues for those queries.
> >
> > However I have a new query which is like
> >
> > select ...some fields of A & B...
> > from A join B on A.A2 = B.B2
> > where A.A1 = ?1
> > order by B.B1
> > limit 100
> >
> > Without the limit, there can be tens of thousandths resulting rows,
> without
> > the A1 condition, there can be millions of resulting rows.
> >
> > With indexes on A & B, the performance of the above is not very good, as
> > indexing A1 is not enough, and indexing B1 is not enough either, so no
> > query plan is satisfying.
>
> Have you tried indexing on A2?
>
> .
> .
> .
> > Is there a better way that would not involve duplicating the data?
> >
> > Eric
>
> Regards,
> Simon
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Multi-table index ersatz?

2015-03-03 Thread Eric Grange

Hi,

I have problem where I need a "multi-table index" ersatz, or maybe a better
data structure :-)

The problem is as follow:

   - Table A : some fields plus fields A1 & A2
   - Table B : some fields plus fields B1 & B2

Both tables have several dozen millions of rows, and both are accessed
independently of each others by some queries, their current structure has
no performance issues for those queries.

However I have a new query which is like

select ...some fields of A & B...
from A join B on A.A2 = B.B2
where A.A1 = ?1
order by B.B1
limit 100


Without the limit, there can be tens of thousandths resulting rows, without
the A1 condition, there can be millions of resulting rows.

With indexes on A & B, the performance of the above is not very good, as
indexing A1 is not enough, and indexing B1 is not enough either, so no
query plan is satisfying.

I can make the query instantaneous by duplicating the A1 & B1 fields in a
dedicated C table (along with the primary keys of A & B), index that table,
and then join back the A & B table to get the other fields.

However this results in a fairly large table of duplicated data, whose sole
purpose is to allow the creation of a fairly large index, which gets me the
performance.

Note that if the fields A1 & B1 are removed from their tables and kept only
in C, this has massive performance implication on other queries running
only against A & B, as those fields are leveraged in other composite
indexes.

Is there a better way that would not involve duplicating the data?

Eric

[sqlite] Appropriate Uses For SQLite

2015-02-19 Thread Eric Grange

> https://www.sqlite.org/whentouse.html :

I am currently relying on SQLite as main database engine for blockchain
explorers, that's usually between 1 and 2 millions http queries per day,
pretty much all of them hitting more than 60 GB of SQLite databases. There
is a mix of simple table queries, complex joins and extensive
aggregation/statistics queries.

This is kind of a sweet spot for SQLite WAL: the data is queried and
updated all the time, but with only one updater task. Performance is very
good, better than what I see from client/server DB under much lighter
loads. And the latency is just excellent: simple queries run in a matter of
microseconds.

And while it may not be fashionable to say so, I really like the SQL in
SQLite :)
There is no query that cannot be performed efficiently and with little
code. Worst that happens is you need a new index or a few temporary tables
to breakup a complex query in several simpler steps.

I would say SQLite is perfectly suited for "front line" web servers that
serve, present and generally make accessible live data coming from other
systems.

Eric

[sqlite] Registering Virtual Tables on the spot (when needed)

2013-11-07 Thread Eric Grange

Hi,

I would like to use SQLite in a scripting context where there are thousands
of "potential" virtual tables, but of these, only a very very small subset
(a couple or a handful at most) would be actually used in any given script,
so it wouldn't make sense to expose them all, all the time.

I'm looking for a way to leverage the existing SQLite SQL parser to figure
out which of those tables are actually used, and expose only those.

Basically, some approach like
- pass the SQL without exposing anything, which could result in an error
- if the error correspond to "unknown table", expose the table
- repeat until there are no more errors (or a "real" error is encountered)

I could use a custom SQL parser, but that would be complicated, would have
to be kept compatible with SQLite evolutions, and the parsing would be
completely wasted when the tables have already been exposed.

Parsing the error message is simple and works, but it's also fragile if the
message text is changed... Is there a more reliable way? Something like an
sqlite3_errentity() that would return the unknown table name when the
sqlite3_errcode() says the error is an unknown table, that would return the
unknown column name when the errcode is an unknown field etc.

Eric
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] FTS3 performance rowid vs docid

2009-12-18 Thread Eric Grange

Hello All,

Using 3.6.21 and FTS3, I've noticed than when doing search queries,
using docid as in

select fts.docid from fts where fts.content match "la"

the search is much slower than when using rowid as in

select fts.rowid from fts where fts.content match "la"

Isn't docid supposed to be an alias of rowid in the case of FTS3 tables?
Is there anything wrong that can happen when using rowid instead of docid?
Or is it merely a missing optimization?

The speed difference depends on the number of returned results, but it
can be several orders of magnitude when the index isn't in cache, and
the memory usage is much higher too. I guess in one case the value is
obtained directly from the index, while in the other one it's read from
the fts_content table, is this correct?

Thanks!
Eric

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Virtual Tables performance

2008-02-20 Thread Eric Grange

We've begun implementing some virtual tables with latest SQLite build
and run into a few questions performance-wise, all these questions
revolve around the same issue (join performance).

1) Is there a way to know which fields are going to be potentially
requested by xColumn? For instance if I have a virtual table with 10
columns, but in the query only 2 of them are referred, is there a way to
know which these are? (without parsing the SQL independantly from SQLite)
I'm asking because the virtual tables we've been considering would be
aggregates, with potentially hundreds of virtual columns, and knowing
which xColumn calls may happen after an xFilter would be quite helpful.

2) Since SQLite uses loops for joins, this can result in a lot of
xFilter calls, and when the actual data is in an other DB server, this
means a lot of actual queries, which are slow, even when xFilter queries
for a specific primary key/index. To reduce this load, we've used
temporary tables with good success (selecting from the virtual table
into the temp table, then joining against the temp table).
Would there be a way to use sub-selects to that purpose? (ie. without
manually creating the temp table)

3) As soon as there are "or" or "in" terms in the SQL, xBestIndex isn't
solicited, and xFilter requests everything unfiltered, the solution to
that we found was to "union" the selects, which doesn't exactly improve
readability. Is there a better solution?
In our case, merely having a range constraint would already be enough to
drastically improve the row count of xFilter (ie. "field in (40, 50,
70)" being presented as a "field>=40 and field<=70" constraint)

Thanks,

Eric

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

78 matches

Mail list logo