[sqlite] sqliteDefaultBusyCallback and HAVE_USLEEP
I noticed that sqliteDefaultBusyCallback() seems to depend directly on the OS (behave differently based on SQLITE_OS_WIN||HAVE_USLEEP). Since the underlying primitive, sqlite3OsSleep(), actually uses the VFS to sleep, and unixSleep() also has a roundup to whole seconds when HAVE_USLEEP is not defined, any time resolution limitations are already handled there. And when a custom VFS is configured, that VFS may well be able to sleep in milli or microseconds using an RTOS-specific function that is not usleep() — for example FreeRTOS has osDelay(). Is there a reason sqliteDefaultBusyCallback() has this dual implementation, and defining HAVE_USLEEP is correct to get better performance on platforms that don’t have usleep()? Or could it be simplified? Thanks, Peter ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
> On Feb 25, 2020, at 6:12 AM, J Decker wrote: > > other than that; if space is really a concern, maybe a zip layer? In my experience, the concern is more about speed than size. Given the raw string/blob data from a SQLite column, and a specific property name/path, how fast can you find its value, convert it to a format SQLite's query engine understand, and return it? The JSON1 extension's parser seems pretty darn fast, but given the nature of JSON, it has to do a bunch of scanning that's O(n) with the length of the data. It's generally impossible to find a value in the middle of a JSON string without examining every single byte that came before it. The way to go faster is to use formats that let you jump through the structure faster. For example, tokenizing dictionary keys and encoding an object's keys as a fixed-width sorted list lets you look up a key in O(log n) time, and the constant factor is very small because you're comparing integers not strings. I don't know if extracting 'classes' will help much in a query. The data will be smaller, and it makes it extremely fast to look up values if you know the class ahead of time, but in a query you don't know the class. Compared to what I described above, there's an extra step where you have to look up the class description from the object. I also worry about use cases where the number of 'classes' becomes unwieldy, because the schema might be using a huge set of keys. For example, something where a customer order contains an object that maps SKUs to quantities, like {"A73563M1": 3, "A73522M0": 7, …}. And there are tens of thousands of SKUs. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
I keep forgetting that the mix/max optimization is not applied at the same time as retrieving other data from a table, so if you actually want to optimize the generated plan, you need to use the following trigger: create trigger data_insert before insert on data begin select raise(ABORT, 'Out of order insert') where julianday(new.key) <= coalesce((select julianday(max(key)) from data), -1); select raise(IGNORE) where new.data IS (select data from data order by key desc limit 1); end; The query optimizer is smart enough to recognize that max(key) when used by itself and an index is available on key, that the index can be used to locate the row containing the max key (it is the last one and there is no need to look any further). However, if you select max(key), data from table then the optimization is not applied and a scan of the whole table is done looking for the max(key) even though there is a suitable index, and you have to specify the index and how to use it and that you only need 1st result. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Keith Medcalf >Sent: Tuesday, 25 February, 2020 14:44 >To: SQLite mailing list >Subject: Re: [sqlite] Fwd: inserting new data only > > >If you are going to do it in all in one insert statement rather than >using a before trigger to throw an error (ie, you want to silently ignore >out-of-order inserts) then the following is slightly more efficient since >the query planner appears to materialize the search so only does it once: > >with old(key, data) > as ( > select coalesce(julianday(max(key)), -1), > data >from data > ), > new(key, data) > as ( >values (?, ?) > ) >insert into data > select new.key, new.data > from new, old > where new.data IS NOT old.data >and julianday(new.key) > old.key; > >However, without the trigger the database will not enforce the >monotonicity of the insert timestamps ... > >Note that you could do the whole thing in before trigger's which would >mean you just use a regular old insert and the triggers do all the work, >and then the database would entirely enforce its own integrity and rules >... no matter who or what was trying to insert records ... > >create table data >( >keytext primary key, >data integer not null >) >without rowid; > >create trigger data_prohibit_oo_inserts before insert on data > when julianday(new.key) <= (select julianday(max(key)) from data) >begin >select raise(ABORT, 'Out of order insert'); >end; > >create trigger data_prohibit_duplicates before insert on data > when new.data IS (select data from (select max(key), data from data)) >begin >select raise(IGNORE); >end; > > >-- insert into data values (?, ?); > >insert into data values ('10:32', 12); >insert into data values ('10:35', 15); >insert into data values ('10:37', 15); >insert into data values ('10:39', 13); >insert into data values ('10:43', 13); >insert into data values ('10:46', 18); > >select * from data; >10:32|12 >10:35|15 >10:39|13 >10:46|18 > >insert into data values ('10:32', 12); >Error: near line 33: Out of order insert >insert into data values ('10:35', 15); >Error: near line 34: Out of order insert >insert into data values ('10:37', 15); >Error: near line 35: Out of order insert >insert into data values ('10:39', 13); >Error: near line 36: Out of order insert >insert into data values ('10:43', 13); >Error: near line 37: Out of order insert >insert into data values ('10:46', 18); > >select * from data; >10:32|12 >10:35|15 >10:39|13 >10:46|18 > >You could even do that with just one before trigger ... > >create table data >( >keytext primary key, >data integer not null >) >without rowid; > >create trigger data_insert before insert on data >begin >select raise(ABORT, 'Out of order insert') > where julianday(new.key) <= coalesce((select julianday(max(key)) >from data), -1); >select raise(IGNORE) > where new.data IS (select data from (select max(key), data from >data)); >end; > >-- insert into data values (?, ?); > >insert into data values ('10:32', 12); >insert into data values ('10:35', 15); >insert into data values ('10:37', 15); >insert into data values ('10:39', 13); >insert into data values ('10:43', 13); >insert into data values ('10:46', 18); > >select * from data; >10:32|12 >10:35|15 >10:39|13 >10:46|18 > >insert into data values ('10:32', 12); >Error: near line 28: Out of order insert >insert into data values ('10:35', 15); >Error: near line 29: Out of order insert >insert into data values ('10:37', 15); >Error: near line 30: Out of order insert >insert into data values ('10:39', 13); >Error: near line 31: Out of order insert >insert into data values ('10:43', 13); >Error: near line 32: Out of order insert >insert into data values ('10:46', 18); >Error:
Re: [sqlite] Fwd: inserting new data only
If you are going to do it in all in one insert statement rather than using a before trigger to throw an error (ie, you want to silently ignore out-of-order inserts) then the following is slightly more efficient since the query planner appears to materialize the search so only does it once: with old(key, data) as ( select coalesce(julianday(max(key)), -1), data from data ), new(key, data) as ( values (?, ?) ) insert into data select new.key, new.data from new, old where new.data IS NOT old.data and julianday(new.key) > old.key; However, without the trigger the database will not enforce the monotonicity of the insert timestamps ... Note that you could do the whole thing in before trigger's which would mean you just use a regular old insert and the triggers do all the work, and then the database would entirely enforce its own integrity and rules ... no matter who or what was trying to insert records ... create table data ( keytext primary key, data integer not null ) without rowid; create trigger data_prohibit_oo_inserts before insert on data when julianday(new.key) <= (select julianday(max(key)) from data) begin select raise(ABORT, 'Out of order insert'); end; create trigger data_prohibit_duplicates before insert on data when new.data IS (select data from (select max(key), data from data)) begin select raise(IGNORE); end; -- insert into data values (?, ?); insert into data values ('10:32', 12); insert into data values ('10:35', 15); insert into data values ('10:37', 15); insert into data values ('10:39', 13); insert into data values ('10:43', 13); insert into data values ('10:46', 18); select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 insert into data values ('10:32', 12); Error: near line 33: Out of order insert insert into data values ('10:35', 15); Error: near line 34: Out of order insert insert into data values ('10:37', 15); Error: near line 35: Out of order insert insert into data values ('10:39', 13); Error: near line 36: Out of order insert insert into data values ('10:43', 13); Error: near line 37: Out of order insert insert into data values ('10:46', 18); select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 You could even do that with just one before trigger ... create table data ( keytext primary key, data integer not null ) without rowid; create trigger data_insert before insert on data begin select raise(ABORT, 'Out of order insert') where julianday(new.key) <= coalesce((select julianday(max(key)) from data), -1); select raise(IGNORE) where new.data IS (select data from (select max(key), data from data)); end; -- insert into data values (?, ?); insert into data values ('10:32', 12); insert into data values ('10:35', 15); insert into data values ('10:37', 15); insert into data values ('10:39', 13); insert into data values ('10:43', 13); insert into data values ('10:46', 18); select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 insert into data values ('10:32', 12); Error: near line 28: Out of order insert insert into data values ('10:35', 15); Error: near line 29: Out of order insert insert into data values ('10:37', 15); Error: near line 30: Out of order insert insert into data values ('10:39', 13); Error: near line 31: Out of order insert insert into data values ('10:43', 13); Error: near line 32: Out of order insert insert into data values ('10:46', 18); Error: near line 33: Out of order insert select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Keith Medcalf >Sent: Tuesday, 25 February, 2020 13:15 >To: SQLite mailing list >Subject: Re: [sqlite] Fwd: inserting new data only > > >On Tuesday, 25 February, 2020 12:23, Przemek Klosowski > wrote: > >>On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf >wrote: > >>> create table data >>> ( >>> keytext primary key, >>> data integer not null >>> ) >>> without rowid; >>> >>> -- insert into data select ?, ? as value where value IS NOT (select >>data from (select max(key), data from data)); >>>.. >>> Constraints: >>> >>> (1) Will only work for appending data (new key > all keys in table) >>> (2) Types of key and data are immaterial as long as you are only >>inserting (appending) new keys. > >>Awesome---exactly what's needed. >>The monotonicity of the time key variable is assured by how the data >>is collected---but is there a way to express that in sqlite? >>create table data ( >> key text primary key check >>(julianday(key) > julianday(select max(key) from data), >> data integer not null); > >You cannot do this with a CHECK constraint since check constraints cannot >execute select statements (check constraints should be table invariant -- >meaning that they must return the same
Re: [sqlite] Fwd: inserting new data only
On Tuesday, 25 February, 2020 12:23, Przemek Klosowski wrote: >On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf wrote: >> create table data >> ( >> keytext primary key, >> data integer not null >> ) >> without rowid; >> >> -- insert into data select ?, ? as value where value IS NOT (select >data from (select max(key), data from data)); >>.. >> Constraints: >> >> (1) Will only work for appending data (new key > all keys in table) >> (2) Types of key and data are immaterial as long as you are only >inserting (appending) new keys. >Awesome---exactly what's needed. >The monotonicity of the time key variable is assured by how the data >is collected---but is there a way to express that in sqlite? >create table data ( > key text primary key check >(julianday(key) > julianday(select max(key) from data), > data integer not null); You cannot do this with a CHECK constraint since check constraints cannot execute select statements (check constraints should be table invariant -- meaning that they must return the same result no matter what data is in the table or other tables, so only act to validate the data on the current row). This would be a case for a before insert trigger to prohibit the insert before it is performed (an after trigger would also work, but that would fire after the row is already inserted and would work by doing a statement rollback to delete the inserted row, so you want to avoid the row insertion completely, and the way to do that is with a before trigger): create trigger data_prohibit_oo_inserts before insert on data when julianday(new.key) <= (select julianday(max(key)) from data) begin select raise(ABORT, 'Out of order insert'); end; This means a lookup and check after the record insertion is computed, if a record is to be inserted, but the btree will already be in memory and will have already traversed to the last entry, so this will consume CPU only, and very little at that. create table data ( keytext primary key, data integer not null ) without rowid; create trigger data_prohibit_oo_inserts before insert on data when julianday(new.key) <= (select julianday(max(key)) from data) begin select raise(ABORT, 'Out of order insert'); end; -- insert into data select ?, ? as value where value != (select value from (select max(key), value from data)); insert into data select '10:32', 12 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:35', 15 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:37', 15 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:39', 13 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:43', 13 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:46', 18 as value where value IS NOT (select data from (select max(key), data from data)); select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 insert into data select '10:32', 12 as value where value IS NOT (select data from (select max(key), data from data)); Error: near line 27: Out of order insert insert into data select '10:35', 15 as value where value IS NOT (select data from (select max(key), data from data)); Error: near line 28: Out of order insert insert into data select '10:37', 15 as value where value IS NOT (select data from (select max(key), data from data)); Error: near line 29: Out of order insert insert into data select '10:39', 13 as value where value IS NOT (select data from (select max(key), data from data)); Error: near line 30: Out of order insert insert into data select '10:43', 13 as value where value IS NOT (select data from (select max(key), data from data)); Error: near line 31: Out of order insert insert into data select '10:46', 18 as value where value IS NOT (select data from (select max(key), data from data)); select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 You could also change the insert like this: insert into data select ? as key, ? as value where value IS NOT (select data from (select max(key), data from data)) and (julianday(key) > coalesce((select julianday(max(key)) from data),-1)); but that would just silently ignore the error rather than raising an error. (You need the coalesce because the "select julianday(max(key)) from data" could be null and it has to be non-null for the > expression to return a not null result (anything > null returns null which is FALSE in a where clause and NOT would not help since "NOT (anything > null)" is still null or false. The trigger does not have this problem because a NULL result means the trigger does not fire. Of course, changing the raise(ABORT ...) in the trigger to raise (IGNORE) achieves the same result. create table data ( keytext
Re: [sqlite] Fwd: inserting new data only
Awesome---exactly what's needed. The monotonicity of the time key variable is assured by how the data is collected---but is there a way to express that in sqlite? create table data ( key text primary key check (julianday(key) > julianday(select max(key) from data), data integer not null); That would/should be done in a trigger, and not a check constraint. A check constraint is only supposed to be something which will _always_ be true about that one and only record, and which only needs the contents of that 1 record to determine, and not something that might change depending on... anything else. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf wrote: > create table data > ( > keytext primary key, > data integer not null > ) > without rowid; > > -- insert into data select (?, ? as value where value IS NOT (select data > from (select max(key), data from data)); >.. > Constraints: > > (1) Will only work for appending data (new key > all keys in table) > (2) Types of key and data are immaterial as long as you are only inserting > (appending) new keys. Awesome---exactly what's needed. The monotonicity of the time key variable is assured by how the data is collected---but is there a way to express that in sqlite? create table data ( key text primary key check (julianday(key) > julianday(select max(key) from data), data integer not null); ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] inserting new data only
This strikes me as best solved in the programming language. If a single set of data points is being acquired in real time, and you have a programming language (or script) generating the INSERT commands, why not simply keep the most recently inserted temperature in a variable ? On the other hand, if you have multiple sensors, or out-of-order insertion, or a stateless insertion program, you could insert every reading and before reporting use a 'cleanup' procedure to remove redundant readings. Both the above would be faster than having SQL execute a search every time a new reading is added. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
Note that this will work for discrete data from a sensor but will not properly historize continuous data. That is, if what you are historizing is process data such as a temperature, this will not permit you to re-create the original engineering data. For that you need to allow the last duplicate to be stored and also store the computed slope to prior with each append -- in that case triggers would be the only way to do it. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: Keith Medcalf >Sent: Tuesday, 25 February, 2020 11:18 >To: 'SQLite mailing list' >Subject: RE: [sqlite] Fwd: inserting new data only > > >create table data >( >keytext primary key, >data integer not null >) >without rowid; > >-- insert into data select (?, ? as value where value IS NOT (select data >from (select max(key), data from data)); > >insert into data select '10:32', 12 as value where value IS NOT (select >data from (select max(key), data from data)); >insert into data select '10:35', 15 as value where value IS NOT (select >data from (select max(key), data from data)); >insert into data select '10:37', 15 as value where value IS NOT (select >data from (select max(key), data from data)); >insert into data select '10:39', 13 as value where value IS NOT (select >data from (select max(key), data from data)); >insert into data select '10:43', 13 as value where value IS NOT (select >data from (select max(key), data from data)); >insert into data select '10:46', 18 as value where value IS NOT (select >data from (select max(key), data from data)); > >select * from data; >10:32|12 >10:35|15 >10:39|13 >10:46|18 > >Constraints: > >(1) Will only work for appending data (new key > all keys in table) >(2) Types of key and data are immaterial as long as you are only >inserting (appending) new keys. > >-- >The fact that there's a Highway to Hell but only a Stairway to Heaven >says a lot about anticipated traffic volume. > >>-Original Message- >>From: sqlite-users On >>Behalf Of Przemek Klosowski >>Sent: Tuesday, 25 February, 2020 10:02 >>To: SQLite mailing list >>Subject: [sqlite] Fwd: inserting new data only >> >>I am storing time series data arriving from a sensor into (time,value) >>records, like so: >>10:32 12 >>10:35 15 >>10:37 15 >>10:39 13 >>10:43 13 >>10:46 18 >> >>and I want to avoid storing repetitive data, so that the database should >>contain >>10:32 12 >>10:35 15 >>10:39 13 >>10:46 18 >>where only the earliest time with the unchanging value is stored. >> >>I don't see how INSERT could be conditional on e.g. value != (select >>value from tbl order by time descending limit 1), so I thought I'd use >>triggers. The only way I could think of was to delete the new >>duplicate record after it has been inserted: >> >>create trigger cull after insert on tbl when >> (select value-lead(value) over (order by time desc) from a limit 1) = 0 >>begin >> delete from a where time like new.time; >>end; >> >>Is there a simpler way? >>___ >>sqlite-users mailing list >>sqlite-users@mailinglists.sqlite.org >>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
create table data ( keytext primary key, data integer not null ) without rowid; -- insert into data select (?, ? as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:32', 12 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:35', 15 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:37', 15 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:39', 13 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:43', 13 as value where value IS NOT (select data from (select max(key), data from data)); insert into data select '10:46', 18 as value where value IS NOT (select data from (select max(key), data from data)); select * from data; 10:32|12 10:35|15 10:39|13 10:46|18 Constraints: (1) Will only work for appending data (new key > all keys in table) (2) Types of key and data are immaterial as long as you are only inserting (appending) new keys. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Przemek Klosowski >Sent: Tuesday, 25 February, 2020 10:02 >To: SQLite mailing list >Subject: [sqlite] Fwd: inserting new data only > >I am storing time series data arriving from a sensor into (time,value) >records, like so: >10:32 12 >10:35 15 >10:37 15 >10:39 13 >10:43 13 >10:46 18 > >and I want to avoid storing repetitive data, so that the database should >contain >10:32 12 >10:35 15 >10:39 13 >10:46 18 >where only the earliest time with the unchanging value is stored. > >I don't see how INSERT could be conditional on e.g. value != (select >value from tbl order by time descending limit 1), so I thought I'd use >triggers. The only way I could think of was to delete the new >duplicate record after it has been inserted: > >create trigger cull after insert on tbl when > (select value-lead(value) over (order by time desc) from a limit 1) = 0 >begin > delete from a where time like new.time; >end; > >Is there a simpler way? >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
On Tue, Feb 25, 2020 at 1:03 PM John McKown wrote: > > I am storing time series data arriving from a sensor into (time,value) > > records, like so: > > 10:32 12 > > 10:35 15 > > 10:37 15 > > 10:39 13 > > 10:43 13 > > 10:46 18 > > > > and I want to avoid storing repetitive data, so that the database should > > contain > > 10:32 12 > > 10:35 15 > > 10:39 13 > > 10:46 18 > > where only the earliest time with the unchanging value is stored. ... > Why not: > > CREATE TABLE ME (ATIME TIME, VALUE INTEGER PRIMARY KEY); > > You can't INSERT duplicate numbers into the "VALUE" column, it will fail. This won't work here because the same value COULD reappear later: 12 15 15 13 13 18 15 needs to be registered as 12 15 13 18 15 ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
On Tue, Feb 25, 2020 at 12:22 PM David Raymond wrote: > > A before trigger which uses the raise function would stop it from getting > inserted in the first place. > > create trigger cull > before insert on tbl > when new.value = (select value from tbl order by time desc limit 1) > begin > select raise(ignore); > end; > > Or if you want it to actually return an error to let you know what happened > you could make it > select raise(abort, 'Repeated entry'); Ah, that's cool---I missed the 'ignore' possibility and thought that the before trigger can't prevent the subsequent insert. > > > -Original Message- > From: sqlite-users On Behalf > Of Przemek Klosowski > Sent: Tuesday, February 25, 2020 12:02 PM > To: SQLite mailing list > Subject: [sqlite] Fwd: inserting new data only > > I am storing time series data arriving from a sensor into (time,value) > records, like so: > 10:32 12 > 10:35 15 > 10:37 15 > 10:39 13 > 10:43 13 > 10:46 18 > > and I want to avoid storing repetitive data, so that the database should > contain > 10:32 12 > 10:35 15 > 10:39 13 > 10:46 18 > where only the earliest time with the unchanging value is stored. > > I don't see how INSERT could be conditional on e.g. value != (select > value from tbl order by time descending limit 1), so I thought I'd use > triggers. The only way I could think of was to delete the new > duplicate record after it has been inserted: > > create trigger cull after insert on tbl when > (select value-lead(value) over (order by time desc) from a limit 1) = 0 > begin >delete from a where time like new.time; > end; > > Is there a simpler way? > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
On Tue, Feb 25, 2020 at 11:03 AM Przemek Klosowski < przemek.klosowski+sql...@gmail.com> wrote: > I am storing time series data arriving from a sensor into (time,value) > records, like so: > 10:32 12 > 10:35 15 > 10:37 15 > 10:39 13 > 10:43 13 > 10:46 18 > > and I want to avoid storing repetitive data, so that the database should > contain > 10:32 12 > 10:35 15 > 10:39 13 > 10:46 18 > where only the earliest time with the unchanging value is stored. > > I don't see how INSERT could be conditional on e.g. value != (select > value from tbl order by time descending limit 1), so I thought I'd use > triggers. The only way I could think of was to delete the new > duplicate record after it has been inserted: > > create trigger cull after insert on tbl when > (select value-lead(value) over (order by time desc) from a limit 1) = 0 > begin >delete from a where time like new.time; > end; > > Is there a simpler way? > ___ > Why not: CREATE TABLE ME (ATIME TIME, VALUE INTEGER PRIMARY KEY); You can't INSERT duplicate numbers into the "VALUE" column, it will fail. -- People in sleeping bags are the soft tacos of the bear world. Maranatha! <>< John McKown ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fwd: inserting new data only
A before trigger which uses the raise function would stop it from getting inserted in the first place. create trigger cull before insert on tbl when new.value = (select value from tbl order by time desc limit 1) begin select raise(ignore); end; Or if you want it to actually return an error to let you know what happened you could make it select raise(abort, 'Repeated entry'); -Original Message- From: sqlite-users On Behalf Of Przemek Klosowski Sent: Tuesday, February 25, 2020 12:02 PM To: SQLite mailing list Subject: [sqlite] Fwd: inserting new data only I am storing time series data arriving from a sensor into (time,value) records, like so: 10:32 12 10:35 15 10:37 15 10:39 13 10:43 13 10:46 18 and I want to avoid storing repetitive data, so that the database should contain 10:32 12 10:35 15 10:39 13 10:46 18 where only the earliest time with the unchanging value is stored. I don't see how INSERT could be conditional on e.g. value != (select value from tbl order by time descending limit 1), so I thought I'd use triggers. The only way I could think of was to delete the new duplicate record after it has been inserted: create trigger cull after insert on tbl when (select value-lead(value) over (order by time desc) from a limit 1) = 0 begin delete from a where time like new.time; end; Is there a simpler way? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Fwd: inserting new data only
I am storing time series data arriving from a sensor into (time,value) records, like so: 10:32 12 10:35 15 10:37 15 10:39 13 10:43 13 10:46 18 and I want to avoid storing repetitive data, so that the database should contain 10:32 12 10:35 15 10:39 13 10:46 18 where only the earliest time with the unchanging value is stored. I don't see how INSERT could be conditional on e.g. value != (select value from tbl order by time descending limit 1), so I thought I'd use triggers. The only way I could think of was to delete the new duplicate record after it has been inserted: create trigger cull after insert on tbl when (select value-lead(value) over (order by time desc) from a limit 1) = 0 begin delete from a where time like new.time; end; Is there a simpler way? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains
Tuesday, February 25, 2020, 3:00:09 PM, Luuk wrote: [tests snipped] > So, the index does not grow indefinitely > On 25-2-2020 14:00, Graham Holden wrote: >> It is an interesting problem. And the above is just guesswork... It would >> be good to verify experimentally that the index really does grow >> indefinitely Just to avoid (more) confusion, that speculation was from Dan's email from 2014 (I pasted his response, quoting the original email from andrewmo who raised the issue: perhaps I should have tried to add another layer of quoting...) IIRC (and I probably don't), I think it was found that there wasn't any "grow indefinitely" involved. I also suspect the cyclic nature of the post-vacuum numbers (27..52..2..27..52) is indicative of what (I think) Dan was describing, namely that the "clean-up" isn't always as "aggressive" as it potentially could be: > If you then add even more data so that > the 16th level-N b-tree is created, everything gets merged together and > we're back in the optimal state - everything in a single b-tree. However - > this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time, > much more data will have to be added before everything is merged together > again. Regards, Graham Holden ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains
script: #!/bin/bash if [ ! -f test.db ] ; then sqlite3 test.db "CREATE VIRTUAL TABLE tab USING fts5(x)"; fi sqlite3 test.db ".import wikipedia tab" a=$(sqlite3 test.db "SELECT count(*) FROM tab_data") echo "# records after import: $a" sqlite3 test.db "DELETE FROM tab" a=$(sqlite3 test.db "SELECT count(*) FROM tab_data") echo "# records after DELETE: $a" sqlite3 test.db "VACUUM" a=$(sqlite3 test.db "SELECT count(*) FROM tab_data") echo "# records after vacuum: $a" output: $> ./test.sh # records after import: 15 # records after DELETE: 27 # records after vacuum: 27 $> ./test.sh # records after import: 40 # records after DELETE: 52 # records after vacuum: 52 $> ./test.sh # records after import: 65 # records after DELETE: 2 # records after vacuum: 2 $> ./test.sh # records after import: 15 # records after DELETE: 27 # records after vacuum: 27 $> ./test.sh # records after import: 40 # records after DELETE: 52 # records after vacuum: 52 $> vi testsh $> vi test.sh $> So, the index does not grow indefinitely On 25-2-2020 14:00, Graham Holden wrote: It is an interesting problem. And the above is just guesswork... It would be good to verify experimentally that the index really does grow indefinitely ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
On Fri, Feb 21, 2020 at 6:03 AM Richard Hipp wrote: > On 2/21/20, Wout Mertens wrote: > > The idea is that upon storing the JSON > > data, the JSON1 extension parses it, extracts the layouts recursively, > > stores them when they are not known yet, and then only stores the > > values in the binary format with the layout identifiers. > > I experimented with a number of similar ideas for storing JSON when I > was first designing the JSON components for SQLite. I was never able > to find anything that was as fast or as compact as just storing the > original JSON text. But I could have overlooked something. If you > have example code for a mechanism that is more space efficient and/or > faster, please share it with us. > text is as long as text is, and numbers, for small ranges, are also compressed to 2 bytes (one for a separator, or opener, and 1 for the value) gets you 0-9 (0-64 if you base64 encode it)... looking at just the data part of JSON. You end up with a lot of overhead from the repeated field name definition. I created a format https://github.com/d3x0r/jsox#jsox--javascript-object-exchange-format that is compatible with existing JSON, but adds the ability to specify 'class' definitions. There's a specification of the grammar in bnf format, and pictures... It tracks the current parsing state, 0, initial being called 'unknown'. If a string is found in an unknown state, followed by an object, then that defines a class type... 'record{id,title,author,data}' then after than, a second occurrence in the unknown state, or within an array or object context, '[record{1342,"book","person",1}]' would use the existing list of names in order with the values, and build an object that was { id:1342,title:"book",author:"person",data:1 }. 'record' could be shortened to any single unicode character, otherwise the saving isn't so great. The definition of 'string' is sort of loose in JSOX, as long as there isn't a format control character ( whitespace, ':', '{', '}', '[', ']' ) you don't need quotes around a sequence of characters to make a string; excepting of course starting with characters that look like a number, and/or match a keyword... The triggering of the mode is '{' after a string, or while collecting a string I also extended the number format of JSON to allow specifying ISO-8601 times as numbers (just have to special case in addition to '.'; ':' 'T' 'Z' '-' (inline and not just at start)). other than that; if space is really a concern, maybe a zip layer? J > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains
This might be to do with how an FTS index works under the hood, involving various levels of "b-tree" that grow as entries are added, but aren't always shrunk when entries are deleted. There were a bunch of emails on the list around 4th to the 13th May 2014: sample below from Dan Kennedy (one of the SQLite devs) which includes the original problem report and his summary of "why". I've a feeling SOME changes MIGHT have been made to how the various levels of b-tree were merged when entries were deleted, but I'm not seeing anything linked to the original thread... perhaps this email will jog a memory... Graham START - Mail from Dan Kennedy, 4 May 2014 - START On 05/01/2014 03:30 PM, andrewmo wrote: > We are using the FTS3 extension to sqlite to store large numbers of short > (~300 byte) documents. This is working very well and providing us with very > fast text search, but the behaviour around deletion of documents has me > confused. > > Our system must control the the size of the database and will delete the > oldest documents when the database size breaches a certain limit. I now > understand from comments on this mailing list and elsewhere that this is not > an optimal pattern for the FTS extension as doclists for the oldest > documents are the least likely to be 'merged'. > > My question is, does this actually work at all? If I delete a row from my > FTS4 table (resulting in a new empty doclist being added to the index), then > I subsequently add many (1000s) new documents and call the 'merge' function > several times (automerge is also enabled), is there any gaurentee that the > empty doclist and the populated doclist that it superseded will ever be > removed? My testing suggests this isn't the case. > > I have a 1GB database with 6million documents. If I keep adding new > documents at around 1 per second and deleting documents when the size of the > data goes beyond 1GB, the size of the index seems to grow and the number of > documents I can store in the 1GB file seems decrease in a linear manner. > > Calling the 'optimize' function seems to solve this issue (removing all the > dead doclists), but that isn't practical for our software, as it implies > some downtime for our high availablity service due to the long execution > time of the optimize function (Could be minutes for a 1GB file). > > I have seen this > (http://sqlite.1065341.n5.nabble.com/fts3-database-grows-td42069.html) post > from 2008. However, it predates the 'automerge' and manual merge features, > and from the documentation I assumed these new features would delete all the > data related to deleted documents. Am I incorrect in my assumption? > > Thanks for any clarification you can offer. Normally, when you write to an FTS index (either to add new doclists or to add delete markers) the new entries are accumulated in-memory for a while and then flushed to a new "level-0" b-tree. A level-0 b-tree is often roughly 1MB in size. Once there are 16 level-0 b-trees, they are merged and written to a single level-1 b-tree. Once there are 16 level-1 b-trees... And so on. So when an entry is deleted from the FTS index, a delete marker is added. But the original doclists are not actually deleted until the delete marker and the doclists are merged into the same b-tree. Delete markers are discarded when they are merged into the oldest b-tree in the index. At first glance it seems (to me) that this means the index might grow to anything up to 16 times its "optimized" size. But I think it's actually worse than that. Say your entire database fits into a single level-N b-tree. You keep adding data (and delete markers) until there are 15 level-N b-trees and almost enough data to create the 16th in lower levels. So at this point the FTS index is 16 times its optimal size. If you then add even more data so that the 16th level-N b-tree is created, everything gets merged together and we're back in the optimal state - everything in a single b-tree. However - this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time, much more data will have to be added before everything is merged together again. So I'm thinking a solution might be: * Fix FTS so that it picks this case - when a merge includes so many delete markers that the output is small enough to be deemed a level-N b-tree, not a level-N+1 b-tree, and * Instead of using the default 16-way merges, the app could organize to periodically invoke the "merge=X,Y" command with a smaller Y value (say 2) to limit the maximum size of the index to Y times its optimal size (instead of 16 times). It is an interesting problem. And the above is just guesswork... It would be good to verify experimentally that the index really does grow indefinitely with this kind of input before trying to "fix" anything. Dan. ENDS - Mail from Dan Kennedy, 4 May 2014 - ENDS Tuesday, February 25, 2020, 11:52:59 AM, Matt Kloss wrote: >
[sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains
Dear sqlite users, I noticed that when you delete lines from a FTS virtual table, somehow there is some data remaining in the sqlite db, so that's it does not shrink much in size. $ sqlite3 test.sql "CREATE VIRTUAL TABLE tab USING fts5(x)" $ curl -s https://www.wikipedia.org | tr -cd '[:alnum:][:space:]' > wikipedia $ ls -lh wikipedia test.sql -rw-r--r-- 1 mkloss mkloss 24K feb. 25 06:55 test.sql -rw-r--r-- 1 mkloss mkloss 54K feb. 25 06:56 wikipedia $ sqlite3 test.sql ".import wikipedia tab" && ls -lh test.sql -rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql $ sqlite3 test.sql "delete from tab" && ls -lh test.sql -rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql $ sqlite3 test.sql "VACUUM" && ls -lh test.sql -rw-r--r-- 1 mkloss mkloss 124K feb. 25 06:56 test.sql I would expect the db size to be 24K (not 124K), as it was when the table "tab" was empty. I noticed that some data remains in the tab_XXX tables, but less than 700 bytes. That's nowhere near the 100K of added cruft: $ for t in tab_{config,content,data,docsize,idx}; do echo "select * from $t;"; done | sqlite3 test.sql | wc -c 682 So here is my questions: (1) How do you really cleanup a db with FTS tables after deleting some lines? (2) If there is no way to remove the cruft, does that mean that adding and deleting lines will constantly inflate the db size? $ sqlite3 --version 3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837balt1 Thank you for your help, Regards, Matthew ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users