[sqlite] sqliteDefaultBusyCallback and HAVE_USLEEP

2020-02-25 Thread Peter Kolbus
I noticed that sqliteDefaultBusyCallback() seems to depend directly on the OS 
(behave differently based on SQLITE_OS_WIN||HAVE_USLEEP).  Since the underlying 
primitive, sqlite3OsSleep(), actually uses the VFS to sleep, and unixSleep() 
also has a roundup to whole seconds when HAVE_USLEEP is not defined, any time 
resolution limitations are already handled there. And when a custom VFS is 
configured, that VFS may well be able to sleep in milli or microseconds using 
an RTOS-specific function that is not usleep() — for example FreeRTOS has 
osDelay().

Is there a reason sqliteDefaultBusyCallback() has this dual implementation, and 
defining HAVE_USLEEP is correct to get better performance on platforms that 
don’t have usleep()? Or could it be simplified?

Thanks,
Peter
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] more efficient JSON encoding: idle musing

2020-02-25 Thread Jens Alfke


> On Feb 25, 2020, at 6:12 AM, J Decker  wrote:
> 
> other than that; if space is really a concern, maybe a zip layer?

In my experience, the concern is more about speed than size. Given the raw 
string/blob data from a SQLite column, and a specific property name/path, how 
fast can you find its value, convert it to a format SQLite's query engine 
understand, and return it?

The JSON1 extension's parser seems pretty darn fast, but given the nature of 
JSON, it has to do a bunch of scanning that's O(n) with the length of the data. 
It's generally impossible to find a value in the middle of a JSON string 
without examining every single byte that came before it.

The way to go faster is to use formats that let you jump through the structure 
faster. For example, tokenizing dictionary keys and encoding an object's keys 
as a fixed-width sorted list lets you look up a key in O(log n) time, and the 
constant factor is very small because you're comparing integers not strings.

I don't know if extracting 'classes' will help much in a query. The data will 
be smaller, and it makes it extremely fast to look up values if you know the 
class ahead of time, but in a query you don't know the class. Compared to what 
I described above, there's an extra step where you have to look up the class 
description from the object.

I also worry about use cases where the number of 'classes' becomes unwieldy, 
because the schema might be using a huge set of keys. For example, something 
where a customer order contains an object that maps SKUs to quantities, like 
{"A73563M1": 3, "A73522M0": 7, …}. And there are tens of thousands of SKUs.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Keith Medcalf

I keep forgetting that the mix/max optimization is not applied at the same time 
as retrieving other data from a table, so if you actually want to optimize the 
generated plan, you need to use the following trigger:

create trigger data_insert before insert on data
begin
select raise(ABORT, 'Out of order insert')
 where julianday(new.key) <= coalesce((select julianday(max(key)) from 
data), -1);
select raise(IGNORE)
 where new.data IS (select data from data order by key desc limit 1);
end;

The query optimizer is smart enough to recognize that max(key) when used by 
itself and an index is available on key, that the index can be used to locate 
the row containing the max key (it is the last one and there is no need to look 
any further).  However, if you select max(key), data from table then the 
optimization is not applied and a scan of the whole table is done looking for 
the max(key) even though there is a suitable index, and you have to specify the 
index and how to use it and that you only need 1st result.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Keith Medcalf
>Sent: Tuesday, 25 February, 2020 14:44
>To: SQLite mailing list 
>Subject: Re: [sqlite] Fwd: inserting new data only
>
>
>If you are going to do it in all in one insert statement rather than
>using a before trigger to throw an error (ie, you want to silently ignore
>out-of-order inserts) then the following is slightly more efficient since
>the query planner appears to materialize the search so only does it once:
>
>with old(key, data)
>  as (
>  select coalesce(julianday(max(key)), -1),
> data
>from data
> ),
> new(key, data)
>  as (
>values (?, ?)
> )
>insert into data
> select new.key, new.data
>   from new, old
>  where new.data IS NOT old.data
>and julianday(new.key) > old.key;
>
>However, without the trigger the database will not enforce the
>monotonicity of the insert timestamps ...
>
>Note that you could do the whole thing in before trigger's which would
>mean you just use a regular old insert and the triggers do all the work,
>and then the database would entirely enforce its own integrity and rules
>... no matter who or what was trying to insert records ...
>
>create table data
>(
>keytext primary key,
>data   integer not null
>)
>without rowid;
>
>create trigger data_prohibit_oo_inserts before insert on data
>  when julianday(new.key) <= (select julianday(max(key)) from data)
>begin
>select raise(ABORT, 'Out of order insert');
>end;
>
>create trigger data_prohibit_duplicates before insert on data
>  when new.data IS (select data from (select max(key), data from data))
>begin
>select raise(IGNORE);
>end;
>
>
>-- insert into data values (?, ?);
>
>insert into data values ('10:32', 12);
>insert into data values ('10:35', 15);
>insert into data values ('10:37', 15);
>insert into data values ('10:39', 13);
>insert into data values ('10:43', 13);
>insert into data values ('10:46', 18);
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>insert into data values ('10:32', 12);
>Error: near line 33: Out of order insert
>insert into data values ('10:35', 15);
>Error: near line 34: Out of order insert
>insert into data values ('10:37', 15);
>Error: near line 35: Out of order insert
>insert into data values ('10:39', 13);
>Error: near line 36: Out of order insert
>insert into data values ('10:43', 13);
>Error: near line 37: Out of order insert
>insert into data values ('10:46', 18);
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>You could even do that with just one before trigger ...
>
>create table data
>(
>keytext primary key,
>data   integer not null
>)
>without rowid;
>
>create trigger data_insert before insert on data
>begin
>select raise(ABORT, 'Out of order insert')
> where julianday(new.key) <= coalesce((select julianday(max(key))
>from data), -1);
>select raise(IGNORE)
> where new.data IS (select data from (select max(key), data from
>data));
>end;
>
>-- insert into data values (?, ?);
>
>insert into data values ('10:32', 12);
>insert into data values ('10:35', 15);
>insert into data values ('10:37', 15);
>insert into data values ('10:39', 13);
>insert into data values ('10:43', 13);
>insert into data values ('10:46', 18);
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>insert into data values ('10:32', 12);
>Error: near line 28: Out of order insert
>insert into data values ('10:35', 15);
>Error: near line 29: Out of order insert
>insert into data values ('10:37', 15);
>Error: near line 30: Out of order insert
>insert into data values ('10:39', 13);
>Error: near line 31: Out of order insert
>insert into data values ('10:43', 13);
>Error: near line 32: Out of order insert
>insert into data values ('10:46', 18);
>Error: 

Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Keith Medcalf

If you are going to do it in all in one insert statement rather than using a 
before trigger to throw an error (ie, you want to silently ignore out-of-order 
inserts) then the following is slightly more efficient since the query planner 
appears to materialize the search so only does it once:

with old(key, data)
  as (
  select coalesce(julianday(max(key)), -1),
 data
from data
 ),
 new(key, data)
  as (
values (?, ?)
 )
insert into data
 select new.key, new.data
   from new, old
  where new.data IS NOT old.data
and julianday(new.key) > old.key;

However, without the trigger the database will not enforce the monotonicity of 
the insert timestamps ...

Note that you could do the whole thing in before trigger's which would mean you 
just use a regular old insert and the triggers do all the work, and then the 
database would entirely enforce its own integrity and rules ... no matter who 
or what was trying to insert records ...

create table data
(
keytext primary key,
data   integer not null
)
without rowid;

create trigger data_prohibit_oo_inserts before insert on data
  when julianday(new.key) <= (select julianday(max(key)) from data)
begin
select raise(ABORT, 'Out of order insert');
end;

create trigger data_prohibit_duplicates before insert on data
  when new.data IS (select data from (select max(key), data from data))
begin
select raise(IGNORE);
end;


-- insert into data values (?, ?);

insert into data values ('10:32', 12);
insert into data values ('10:35', 15);
insert into data values ('10:37', 15);
insert into data values ('10:39', 13);
insert into data values ('10:43', 13);
insert into data values ('10:46', 18);

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data values ('10:32', 12);
Error: near line 33: Out of order insert
insert into data values ('10:35', 15);
Error: near line 34: Out of order insert
insert into data values ('10:37', 15);
Error: near line 35: Out of order insert
insert into data values ('10:39', 13);
Error: near line 36: Out of order insert
insert into data values ('10:43', 13);
Error: near line 37: Out of order insert
insert into data values ('10:46', 18);

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

You could even do that with just one before trigger ...

create table data
(
keytext primary key,
data   integer not null
)
without rowid;

create trigger data_insert before insert on data
begin
select raise(ABORT, 'Out of order insert')
 where julianday(new.key) <= coalesce((select julianday(max(key)) from 
data), -1);
select raise(IGNORE)
 where new.data IS (select data from (select max(key), data from data));
end;

-- insert into data values (?, ?);

insert into data values ('10:32', 12);
insert into data values ('10:35', 15);
insert into data values ('10:37', 15);
insert into data values ('10:39', 13);
insert into data values ('10:43', 13);
insert into data values ('10:46', 18);

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data values ('10:32', 12);
Error: near line 28: Out of order insert
insert into data values ('10:35', 15);
Error: near line 29: Out of order insert
insert into data values ('10:37', 15);
Error: near line 30: Out of order insert
insert into data values ('10:39', 13);
Error: near line 31: Out of order insert
insert into data values ('10:43', 13);
Error: near line 32: Out of order insert
insert into data values ('10:46', 18);
Error: near line 33: Out of order insert

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Keith Medcalf
>Sent: Tuesday, 25 February, 2020 13:15
>To: SQLite mailing list 
>Subject: Re: [sqlite] Fwd: inserting new data only
>
>
>On Tuesday, 25 February, 2020 12:23, Przemek Klosowski
> wrote:
>
>>On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf 
>wrote:
>
>>> create table data
>>> (
>>> keytext primary key,
>>> data   integer not null
>>> )
>>> without rowid;
>>>
>>> -- insert into data select ?, ? as value where value IS NOT (select
>>data from (select max(key), data from data));
>>>..
>>> Constraints:
>>>
>>> (1) Will only work for appending data (new key > all keys in table)
>>> (2) Types of key and data are immaterial as long as you are only
>>inserting (appending) new keys.
>
>>Awesome---exactly what's needed.
>>The monotonicity of the time key variable is assured by how the data
>>is collected---but  is there a way to express that in sqlite?
>>create table data (
>>  key text primary key check
>>(julianday(key) > julianday(select max(key) from data),
>>  data integer not null);
>
>You cannot do this with a CHECK constraint since check constraints cannot
>execute select statements (check constraints should be table invariant --
>meaning that they must return the same 

Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Keith Medcalf

On Tuesday, 25 February, 2020 12:23, Przemek Klosowski 
 wrote:

>On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf  wrote:

>> create table data
>> (
>> keytext primary key,
>> data   integer not null
>> )
>> without rowid;
>>
>> -- insert into data select ?, ? as value where value IS NOT (select
>data from (select max(key), data from data));
>>..
>> Constraints:
>>
>> (1) Will only work for appending data (new key > all keys in table)
>> (2) Types of key and data are immaterial as long as you are only
>inserting (appending) new keys.

>Awesome---exactly what's needed.
>The monotonicity of the time key variable is assured by how the data
>is collected---but  is there a way to express that in sqlite?
>create table data (
>  key text primary key check
>(julianday(key) > julianday(select max(key) from data),
>  data integer not null);

You cannot do this with a CHECK constraint since check constraints cannot 
execute select statements (check constraints should be table invariant -- 
meaning that they must return the same result no matter what data is in the 
table or other tables, so only act to validate the data on the current row).  

This would be a case for a before insert trigger to prohibit the insert before 
it is performed (an after trigger would also work, but that would fire after 
the row is already inserted and would work by doing a statement rollback to 
delete the inserted row, so you want to avoid the row insertion completely, and 
the way to do that is with a before trigger):

create trigger data_prohibit_oo_inserts before insert on data 
  when julianday(new.key) <= (select julianday(max(key)) from data)
begin
select raise(ABORT, 'Out of order insert');
end;

This means a lookup and check after the record insertion is computed, if a 
record is to be inserted, but the btree will already be in memory and will have 
already traversed to the last entry, so this will consume CPU only, and very 
little at that.

create table data
(
keytext primary key,
data   integer not null
)
without rowid;

create trigger data_prohibit_oo_inserts before insert on data
  when julianday(new.key) <= (select julianday(max(key)) from data)
begin
select raise(ABORT, 'Out of order insert');
end;


-- insert into data select ?, ? as value where value != (select value from 
(select max(key), value from data));

insert into data select '10:32', 12 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:35', 15 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:37', 15 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:39', 13 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:43', 13 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:46', 18 as value where value IS NOT (select data 
from (select max(key), data from data));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

insert into data select '10:32', 12 as value where value IS NOT (select data 
from (select max(key), data from data));
Error: near line 27: Out of order insert
insert into data select '10:35', 15 as value where value IS NOT (select data 
from (select max(key), data from data));
Error: near line 28: Out of order insert
insert into data select '10:37', 15 as value where value IS NOT (select data 
from (select max(key), data from data));
Error: near line 29: Out of order insert
insert into data select '10:39', 13 as value where value IS NOT (select data 
from (select max(key), data from data));
Error: near line 30: Out of order insert
insert into data select '10:43', 13 as value where value IS NOT (select data 
from (select max(key), data from data));
Error: near line 31: Out of order insert
insert into data select '10:46', 18 as value where value IS NOT (select data 
from (select max(key), data from data));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

You could also change the insert like this:

insert into data 
 select ? as key, ? as value 
  where value IS NOT (select data from (select max(key), data from data))
and (julianday(key) > coalesce((select julianday(max(key)) from 
data),-1));

but that would just silently ignore the error rather than raising an error.  
(You need the coalesce because the "select julianday(max(key)) from data" could 
be null and it has to be non-null for the > expression to return a not null 
result (anything > null returns null which is FALSE in a where clause and NOT 
would not help since "NOT (anything > null)" is still null or false.  The 
trigger does not have this problem because a NULL result means the trigger does 
not fire.  Of course, changing the raise(ABORT ...) in the trigger to raise 
(IGNORE) achieves the same result.

create table data
(
keytext 

Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread David Raymond
Awesome---exactly what's needed.
The monotonicity of the time key variable is assured by how the data
is collected---but  is there a way to express that in sqlite?
create table data (
  key text primary key check
(julianday(key) > julianday(select max(key) from data),
  data integer not null);


That would/should be done in a trigger, and not a check constraint. A check 
constraint is only supposed to be something which will _always_ be true about 
that one and only record, and which only needs the contents of that 1 record to 
determine, and not something that might change depending on... anything else.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Przemek Klosowski
On Tue, Feb 25, 2020 at 1:18 PM Keith Medcalf  wrote:
> create table data
> (
> keytext primary key,
> data   integer not null
> )
> without rowid;
>
> -- insert into data select (?, ? as value where value IS NOT (select data 
> from (select max(key), data from data));
>..
> Constraints:
>
> (1) Will only work for appending data (new key > all keys in table)
> (2) Types of key and data are immaterial as long as you are only inserting 
> (appending) new keys.

Awesome---exactly what's needed.
The monotonicity of the time key variable is assured by how the data
is collected---but  is there a way to express that in sqlite?
create table data (
  key text primary key check
(julianday(key) > julianday(select max(key) from data),
  data integer not null);
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting new data only

2020-02-25 Thread Simon Slavin
This strikes me as best solved in the programming language.

If a single set of data points is being acquired in real time, and you have a 
programming language (or script) generating the INSERT commands, why not simply 
keep the most recently inserted temperature in a variable ?

On the other hand, if you have multiple sensors, or out-of-order insertion, or 
a stateless insertion program, you could insert every reading and before 
reporting use a 'cleanup' procedure to remove redundant readings.

Both the above would be faster than having SQL execute a search every time a 
new reading is added.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Keith Medcalf

Note that this will work for discrete data from a sensor but will not properly 
historize continuous data.  That is, if what you are historizing is process 
data such as a temperature, this will not permit you to re-create the original 
engineering data.  For that you need to allow the last duplicate to be stored 
and also store the computed slope to prior with each append -- in that case 
triggers would be the only way to do it.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: Keith Medcalf 
>Sent: Tuesday, 25 February, 2020 11:18
>To: 'SQLite mailing list' 
>Subject: RE: [sqlite] Fwd: inserting new data only
>
>
>create table data
>(
>keytext primary key,
>data   integer not null
>)
>without rowid;
>
>-- insert into data select (?, ? as value where value IS NOT (select data
>from (select max(key), data from data));
>
>insert into data select '10:32', 12 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:35', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:37', 15 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:39', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:43', 13 as value where value IS NOT (select
>data from (select max(key), data from data));
>insert into data select '10:46', 18 as value where value IS NOT (select
>data from (select max(key), data from data));
>
>select * from data;
>10:32|12
>10:35|15
>10:39|13
>10:46|18
>
>Constraints:
>
>(1) Will only work for appending data (new key > all keys in table)
>(2) Types of key and data are immaterial as long as you are only
>inserting (appending) new keys.
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-Original Message-
>>From: sqlite-users  On
>>Behalf Of Przemek Klosowski
>>Sent: Tuesday, 25 February, 2020 10:02
>>To: SQLite mailing list 
>>Subject: [sqlite] Fwd: inserting new data only
>>
>>I am storing time series data arriving from a sensor into (time,value)
>>records, like so:
>>10:32  12
>>10:35  15
>>10:37  15
>>10:39  13
>>10:43  13
>>10:46  18
>>
>>and I want to avoid storing repetitive data, so that the database should
>>contain
>>10:32  12
>>10:35  15
>>10:39  13
>>10:46  18
>>where only the earliest time with the unchanging value is stored.
>>
>>I don't see how INSERT could be conditional on e.g.  value != (select
>>value from tbl order by time descending limit 1), so I thought I'd use
>>triggers. The only way I could think of was to delete the new
>>duplicate record after it has been inserted:
>>
>>create trigger cull after insert on tbl when
>> (select value-lead(value) over (order by time desc) from a limit 1) = 0
>>begin
>>   delete from a where time like new.time;
>>end;
>>
>>Is there a simpler way?
>>___
>>sqlite-users mailing list
>>sqlite-users@mailinglists.sqlite.org
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Keith Medcalf

create table data
(
keytext primary key,
data   integer not null
)
without rowid;

-- insert into data select (?, ? as value where value IS NOT (select data from 
(select max(key), data from data));

insert into data select '10:32', 12 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:35', 15 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:37', 15 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:39', 13 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:43', 13 as value where value IS NOT (select data 
from (select max(key), data from data));
insert into data select '10:46', 18 as value where value IS NOT (select data 
from (select max(key), data from data));

select * from data;
10:32|12
10:35|15
10:39|13
10:46|18

Constraints:

(1) Will only work for appending data (new key > all keys in table)
(2) Types of key and data are immaterial as long as you are only inserting 
(appending) new keys.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Przemek Klosowski
>Sent: Tuesday, 25 February, 2020 10:02
>To: SQLite mailing list 
>Subject: [sqlite] Fwd: inserting new data only
>
>I am storing time series data arriving from a sensor into (time,value)
>records, like so:
>10:32  12
>10:35  15
>10:37  15
>10:39  13
>10:43  13
>10:46  18
>
>and I want to avoid storing repetitive data, so that the database should
>contain
>10:32  12
>10:35  15
>10:39  13
>10:46  18
>where only the earliest time with the unchanging value is stored.
>
>I don't see how INSERT could be conditional on e.g.  value != (select
>value from tbl order by time descending limit 1), so I thought I'd use
>triggers. The only way I could think of was to delete the new
>duplicate record after it has been inserted:
>
>create trigger cull after insert on tbl when
> (select value-lead(value) over (order by time desc) from a limit 1) = 0
>begin
>   delete from a where time like new.time;
>end;
>
>Is there a simpler way?
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Przemek Klosowski
On Tue, Feb 25, 2020 at 1:03 PM John McKown
 wrote:
> > I am storing time series data arriving from a sensor into (time,value)
> > records, like so:
> > 10:32  12
> > 10:35  15
> > 10:37  15
> > 10:39  13
> > 10:43  13
> > 10:46  18
> >
> > and I want to avoid storing repetitive data, so that the database should
> > contain
> > 10:32  12
> > 10:35  15
> > 10:39  13
> > 10:46  18
> > where only the earliest time with the unchanging value is stored.
...
> Why not:
>
> CREATE TABLE ME (ATIME TIME, VALUE INTEGER PRIMARY KEY);
>
> You can't INSERT duplicate numbers into the "VALUE" column, it will fail.

This won't work here because the same value COULD reappear later:
12 15 15 13 13 18 15
needs to be registered as 12 15 13 18 15
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread Przemek Klosowski
On Tue, Feb 25, 2020 at 12:22 PM David Raymond  wrote:
>
> A before trigger which uses the raise function would stop it from getting 
> inserted in the first place.
>
> create trigger cull
> before insert on tbl
> when new.value = (select value from tbl order by time desc limit 1)
> begin
> select raise(ignore);
> end;
>
> Or if you want it to actually return an error to let you know what happened 
> you could make it
> select raise(abort, 'Repeated entry');

Ah, that's cool---I missed the 'ignore' possibility and thought that
the before trigger can't prevent the subsequent insert.

>
>
> -Original Message-
> From: sqlite-users  On Behalf 
> Of Przemek Klosowski
> Sent: Tuesday, February 25, 2020 12:02 PM
> To: SQLite mailing list 
> Subject: [sqlite] Fwd: inserting new data only
>
> I am storing time series data arriving from a sensor into (time,value)
> records, like so:
> 10:32  12
> 10:35  15
> 10:37  15
> 10:39  13
> 10:43  13
> 10:46  18
>
> and I want to avoid storing repetitive data, so that the database should 
> contain
> 10:32  12
> 10:35  15
> 10:39  13
> 10:46  18
> where only the earliest time with the unchanging value is stored.
>
> I don't see how INSERT could be conditional on e.g.  value != (select
> value from tbl order by time descending limit 1), so I thought I'd use
> triggers. The only way I could think of was to delete the new
> duplicate record after it has been inserted:
>
> create trigger cull after insert on tbl when
>  (select value-lead(value) over (order by time desc) from a limit 1) = 0
> begin
>delete from a where time like new.time;
> end;
>
> Is there a simpler way?
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread John McKown
On Tue, Feb 25, 2020 at 11:03 AM Przemek Klosowski <
przemek.klosowski+sql...@gmail.com> wrote:

> I am storing time series data arriving from a sensor into (time,value)
> records, like so:
> 10:32  12
> 10:35  15
> 10:37  15
> 10:39  13
> 10:43  13
> 10:46  18
>
> and I want to avoid storing repetitive data, so that the database should
> contain
> 10:32  12
> 10:35  15
> 10:39  13
> 10:46  18
> where only the earliest time with the unchanging value is stored.
>
> I don't see how INSERT could be conditional on e.g.  value != (select
> value from tbl order by time descending limit 1), so I thought I'd use
> triggers. The only way I could think of was to delete the new
> duplicate record after it has been inserted:
>
> create trigger cull after insert on tbl when
>  (select value-lead(value) over (order by time desc) from a limit 1) = 0
> begin
>delete from a where time like new.time;
> end;
>
> Is there a simpler way?
> ___
>


Why not:

CREATE TABLE ME (ATIME TIME, VALUE INTEGER PRIMARY KEY);

You can't INSERT duplicate numbers into the "VALUE" column, it will fail.


-- 
People in sleeping bags are the soft tacos of the bear world.
Maranatha! <><
John McKown
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: inserting new data only

2020-02-25 Thread David Raymond
A before trigger which uses the raise function would stop it from getting 
inserted in the first place. 

create trigger cull
before insert on tbl
when new.value = (select value from tbl order by time desc limit 1)
begin
select raise(ignore);
end;

Or if you want it to actually return an error to let you know what happened you 
could make it
select raise(abort, 'Repeated entry');



-Original Message-
From: sqlite-users  On Behalf Of 
Przemek Klosowski
Sent: Tuesday, February 25, 2020 12:02 PM
To: SQLite mailing list 
Subject: [sqlite] Fwd: inserting new data only

I am storing time series data arriving from a sensor into (time,value)
records, like so:
10:32  12
10:35  15
10:37  15
10:39  13
10:43  13
10:46  18

and I want to avoid storing repetitive data, so that the database should contain
10:32  12
10:35  15
10:39  13
10:46  18
where only the earliest time with the unchanging value is stored.

I don't see how INSERT could be conditional on e.g.  value != (select
value from tbl order by time descending limit 1), so I thought I'd use
triggers. The only way I could think of was to delete the new
duplicate record after it has been inserted:

create trigger cull after insert on tbl when
 (select value-lead(value) over (order by time desc) from a limit 1) = 0
begin
   delete from a where time like new.time;
end;

Is there a simpler way?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fwd: inserting new data only

2020-02-25 Thread Przemek Klosowski
I am storing time series data arriving from a sensor into (time,value)
records, like so:
10:32  12
10:35  15
10:37  15
10:39  13
10:43  13
10:46  18

and I want to avoid storing repetitive data, so that the database should contain
10:32  12
10:35  15
10:39  13
10:46  18
where only the earliest time with the unchanging value is stored.

I don't see how INSERT could be conditional on e.g.  value != (select
value from tbl order by time descending limit 1), so I thought I'd use
triggers. The only way I could think of was to delete the new
duplicate record after it has been inserted:

create trigger cull after insert on tbl when
 (select value-lead(value) over (order by time desc) from a limit 1) = 0
begin
   delete from a where time like new.time;
end;

Is there a simpler way?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains

2020-02-25 Thread Graham Holden
Tuesday, February 25, 2020, 3:00:09 PM, Luuk  wrote:

[tests snipped]

> So, the index does not grow indefinitely

> On 25-2-2020 14:00, Graham Holden wrote:
>> It is an interesting problem. And the above is just guesswork... It would
>> be good to verify experimentally that the index really does grow
>> indefinitely

Just to avoid (more) confusion, that speculation was from Dan's email
from 2014 (I pasted his response, quoting the original email from
andrewmo who raised the issue: perhaps I should have tried to add
another layer of quoting...)

IIRC (and I probably don't), I think it was found that there wasn't
any "grow indefinitely" involved. I also suspect the cyclic nature of
the post-vacuum numbers (27..52..2..27..52) is indicative of what (I
think) Dan was describing, namely that the "clean-up" isn't always as
"aggressive" as it potentially could be:

>  If you then add even more data so that
> the 16th level-N b-tree is created, everything gets merged together and
> we're back in the optimal state - everything in a single b-tree. However -
> this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
> much more data will have to be added before everything is merged together
> again.

Regards,
Graham Holden


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains

2020-02-25 Thread Luuk

script:

#!/bin/bash

if [ ! -f test.db ] ; then sqlite3 test.db "CREATE VIRTUAL TABLE tab 
USING fts5(x)"; fi

sqlite3 test.db ".import wikipedia tab"
a=$(sqlite3 test.db "SELECT count(*) FROM tab_data")
echo "# records after import: $a"
sqlite3 test.db "DELETE FROM tab"
a=$(sqlite3 test.db "SELECT count(*) FROM tab_data")
echo "# records after DELETE: $a"
sqlite3 test.db "VACUUM"
a=$(sqlite3 test.db "SELECT count(*) FROM tab_data")
echo "# records after vacuum: $a"

output:

$> ./test.sh
# records after import: 15
# records after DELETE: 27
# records after vacuum: 27
$> ./test.sh
# records after import: 40
# records after DELETE: 52
# records after vacuum: 52
$> ./test.sh
# records after import: 65
# records after DELETE: 2
# records after vacuum: 2
$> ./test.sh
# records after import: 15
# records after DELETE: 27
# records after vacuum: 27
$> ./test.sh
# records after import: 40
# records after DELETE: 52
# records after vacuum: 52
$> vi testsh
$> vi test.sh
$>

So, the index does not grow indefinitely

On 25-2-2020 14:00, Graham Holden wrote:

It is an interesting problem. And the above is just guesswork... It would
be good to verify experimentally that the index really does grow
indefinitely

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] more efficient JSON encoding: idle musing

2020-02-25 Thread J Decker
On Fri, Feb 21, 2020 at 6:03 AM Richard Hipp  wrote:

> On 2/21/20, Wout Mertens  wrote:
> > The idea is that upon storing the JSON
> > data, the JSON1 extension parses it, extracts the layouts recursively,
> > stores them when they are not known yet, and then only stores the
> > values in the binary format with the layout identifiers.
>
> I experimented with a number of similar ideas for storing JSON when I
> was first designing the JSON components for SQLite.  I was never able
> to find anything that was as fast or as compact as just storing the
> original JSON text.  But I could have overlooked something.  If you
> have example code for a mechanism that is more space efficient and/or
> faster, please share it with us.
>

text is as long as text is, and numbers, for small ranges, are also
compressed to 2 bytes (one for a separator, or opener, and 1 for the value)
gets you 0-9 (0-64 if you base64 encode it)... looking at just the data
part of JSON.  You end up with a lot of overhead from the repeated field
name definition.

I created a format
https://github.com/d3x0r/jsox#jsox--javascript-object-exchange-format that
is compatible with existing JSON, but adds the ability to specify 'class'
definitions.  There's a specification of the grammar in bnf format, and
pictures... It tracks the current parsing state, 0, initial being called
'unknown'.  If a string is found in an unknown state, followed by an
object, then that defines a class type... 'record{id,title,author,data}'
then after than, a second occurrence in the unknown state, or within an
array or object context, '[record{1342,"book","person",1}]'  would use the
existing list of names in order with the values, and build an object that
was { id:1342,title:"book",author:"person",data:1 }.   'record' could be
shortened to any single unicode character, otherwise the saving isn't so
great.
The definition of 'string' is sort of loose in JSOX, as long as there isn't
a format control character ( whitespace, ':', '{', '}', '[', ']' ) you
don't need quotes around a sequence of characters to make a string;
excepting of course starting with characters that look like a number,
and/or match a keyword...
The triggering of the mode is '{' after a string, or while collecting a
string

I also extended the number format of JSON to allow specifying ISO-8601
times as numbers (just have to special case in addition to '.'; ':' 'T'
'Z' '-' (inline and not just at start)).

other than that; if space is really a concern, maybe a zip layer?

J


> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains

2020-02-25 Thread Graham Holden
This might be to do with how an FTS index works under the hood,
involving various levels of "b-tree" that grow as entries are added,
but aren't always shrunk when entries are deleted.

There were a bunch of emails on the list around 4th to the 13th May
2014: sample below from Dan Kennedy (one of the SQLite devs) which
includes the original problem report and his summary of "why".

I've a feeling SOME changes MIGHT have been made to how the various
levels of b-tree were merged when entries were deleted, but I'm not
seeing anything linked to the original thread... perhaps this email
will jog a memory...

Graham


START - Mail from Dan Kennedy, 4 May 2014 - START

On 05/01/2014 03:30 PM, andrewmo wrote:
> We are using the FTS3 extension to sqlite to store large numbers of short
> (~300 byte) documents. This is working very well and providing us with very
> fast text search, but the behaviour around deletion of documents has me
> confused.
>
> Our system must control the the size of the database and will delete the
> oldest documents when the database size breaches a certain limit. I now
> understand from comments on this mailing list and elsewhere that this is not
> an optimal pattern for the FTS extension as doclists for the oldest
> documents are the least likely to be 'merged'.
>
> My question is, does this actually work at all? If I delete a row from my
> FTS4 table (resulting in a new empty doclist being added to the index), then
> I subsequently add many (1000s) new documents and call the 'merge' function
> several times (automerge is also enabled), is there any gaurentee that the
> empty doclist and the populated doclist that it superseded will ever be
> removed? My testing suggests this isn't the case.
>
> I have a 1GB database with 6million documents. If I keep adding new
> documents at around 1 per second and deleting documents when the size of the
> data goes beyond 1GB, the size of the index seems to grow and the number of
> documents I can store in the 1GB file seems decrease in a linear manner.
>
> Calling the 'optimize' function seems to solve this issue (removing all the
> dead doclists), but that isn't practical for our software, as it implies
> some downtime for our high availablity service due to the long execution
> time of the optimize function (Could be minutes for a 1GB file).
>
> I have seen this
> (http://sqlite.1065341.n5.nabble.com/fts3-database-grows-td42069.html) post
> from 2008. However, it predates the 'automerge' and manual merge features,
> and from the documentation I assumed these new features would delete all the
> data related to deleted documents. Am I incorrect in my assumption?
>
> Thanks for any clarification you can offer.

Normally, when you write to an FTS index (either to add new doclists or to
add delete markers) the new entries are accumulated in-memory for a while
and then flushed to a new "level-0" b-tree. A level-0 b-tree is often
roughly 1MB in size. Once there are 16 level-0 b-trees, they are merged
and written to a single level-1 b-tree. Once there are 16 level-1 b-trees...
And so on.

So when an entry is deleted from the FTS index, a delete marker is added.
But the original doclists are not actually deleted until the delete marker
and the doclists are merged into the same b-tree. Delete markers are
discarded when they are merged into the oldest b-tree in the index.

At first glance it seems (to me) that this means the index might grow to
anything up to 16 times its "optimized" size. But I think it's actually
worse than that.

Say your entire database fits into a single level-N b-tree. You keep adding
data (and delete markers) until there are 15 level-N b-trees and almost
enough data to create the 16th in lower levels. So at this point the FTS
index is 16 times its optimal size. If you then add even more data so that
the 16th level-N b-tree is created, everything gets merged together and
we're back in the optimal state - everything in a single b-tree. However -
this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
much more data will have to be added before everything is merged together
again.

So I'm thinking a solution might be:

   * Fix FTS so that it picks this case - when a merge includes so many
 delete markers that the output is small enough to be deemed a level-N
 b-tree, not a level-N+1 b-tree, and

   * Instead of using the default 16-way merges, the app could organize
 to periodically invoke the "merge=X,Y" command with a smaller Y value
 (say 2) to limit the maximum size of the index to Y times its optimal
 size (instead of 16 times).

It is an interesting problem. And the above is just guesswork... It would
be good to verify experimentally that the index really does grow 
indefinitely
with this kind of input before trying to "fix" anything.

Dan.

ENDS - Mail from Dan Kennedy, 4 May 2014 - ENDS

Tuesday, February 25, 2020, 11:52:59 AM, Matt Kloss  
wrote:

> 

[sqlite] After deleting data from a FTS table and doing VACUUM, unwanted data remains

2020-02-25 Thread Matt Kloss
Dear sqlite users,

I noticed that when you delete lines from a FTS virtual table, somehow there is 
some data remaining in the sqlite db, so that's it does not shrink much in size.

$ sqlite3 test.sql "CREATE VIRTUAL TABLE tab USING fts5(x)"
$ curl -s https://www.wikipedia.org | tr -cd '[:alnum:][:space:]' > wikipedia
$ ls -lh wikipedia test.sql
-rw-r--r-- 1 mkloss mkloss 24K feb. 25 06:55 test.sql
-rw-r--r-- 1 mkloss mkloss 54K feb. 25 06:56 wikipedia
$ sqlite3 test.sql ".import wikipedia tab" && ls -lh test.sql
-rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql
$ sqlite3 test.sql "delete from tab" && ls -lh test.sql
-rw-r--r-- 1 mkloss mkloss 148K feb. 25 06:56 test.sql
$ sqlite3 test.sql "VACUUM" && ls -lh test.sql
-rw-r--r-- 1 mkloss mkloss 124K feb. 25 06:56 test.sql

I would expect the db size to be 24K (not 124K), as it was when the table "tab" 
was empty.

I noticed that some data remains in the tab_XXX tables, but less than 700 
bytes. That's nowhere near the 100K of added cruft:

$ for t in tab_{config,content,data,docsize,idx}; do echo "select * from $t;"; 
done | sqlite3 test.sql | wc -c
682

So here is my questions:
(1) How do you really cleanup a db with FTS tables after deleting some lines?
(2) If there is no way to remove the cruft, does that mean that adding and 
deleting lines will constantly inflate the db size?

$ sqlite3 --version
3.31.1 2020-01-27 19:55:54 
3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837balt1

Thank you for your help,
Regards,

Matthew
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users