Re: [sqlite] Bug in SQLite version 3.31.1 and 3.32?

2020-03-06 Thread Richard Damon

On 3/6/20 9:54 PM, Keith Medcalf wrote:

On Friday, 6 March, 2020 19:25, Richard Damon  wrote:


It is sort of like NaN, where a Nan is neither less than, greater than
or equal to any value, including itself.

NULL (as in SQL NULL) means "missing value" or "unknown".  NULL represents any value 
within the domain, we simply do not know what that value is.  That is, the value "NULL" for colour 
of a car means that we do not know the colour -- however, it still has one.

NaN, on the other hand, means that the value is outside the domain and that 
there is no possible value of the domain which well render the proposition true.

For example, the state of Schroedingers Cat is NULL.  It has a state, either 
dead or alive.  That state is merely unknown until one looks in the box.  
However, if when you looked in the box there was no cat, then the cat would be 
a NaN since its state was outside of the domain of states for a cat in a box 
with a time release poison after the release of the poison, that is, the 
non-existance of a cat in the box precludes the possibility of the state of the 
cat in the box being either either dead or alive.


It may have a different meaning, but similar effects on logic.

As an aside, for the quantum effect Schrodinger's Cat is designed to 
demonstrate, the cat ISN'T just one of dead or alive but not know which, 
but exists as a probability wave between the two states. This is why the 
photon which goes through one of two slits generates an interference 
pattern unless you detect which slit it goes through, if you measure the 
slit it went through, you get a different pattern of light, as the lack 
of knowledge allows it to be less precise in its position and the 
probability of going through the left slit interferes with the 
probability of that same particle going through the right slit, so the 
pattern implies it sort of went through both at once.


Detecting the state of Schrodinger's Cat actually changes its state, 
collapsing the wave into one of the definitive states.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in SQLite version 3.31.1 and 3.32?

2020-03-06 Thread Richard Damon

On 3/6/20 9:00 PM, Simon Slavin wrote:

On 7 Mar 2020, at 1:52am, Xinyue Chen  wrote:


If I change IS NOT FALSE to IS TRUE, the results will be different. I
assume they should perform in the same way?

No.  Because NULL is not TRUE and is not FALSE.

SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> SELECT NULL IS TRUE;
0
sqlite> SELECT NULL IS FALSE;
0
sqlite> SELECT NULL IS NOT TRUE;
1
sqlite> SELECT NULL IS NOT FALSE;
1

Once you can have NULL values, you have to know the rules very well when you 
apply logic.  Other values make sense, but NULL is not logical.
It is sort of like NaN, where a Nan is neither less than, greater than 
or equal to any value, including itself.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Richard Damon

On 2/23/20 3:06 PM, R.Smith wrote:


On 2020/02/23 21:23, Richard Damon wrote:

On 2/23/20 8:31 AM, Olaf Schmidt wrote:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.




It is far less of a worse design than having an AFTER DELETE trigger 
doing trivial processing when you expect to delete 30K records at a 
time due to a "well-designed" 1NF schema.


Firstly, good design guides are only guides.
Secondly, having said that, either embrace the full idealism or none 
of it, but doing some things the good way and others not, hampers the 
"goodness" of the overall system.
Lastly, we usually use ON DELETE CASCADE Foreign Key relations to do 
what you are doing with the trigger, but of course you are using FTS5 
and I do not know if it can be done the FK way, or indeed how much the 
FK way will be better considering the same Indexes remain.



Note - I don't have a good solution here, if it was me I would 
immediately kill those triggers, see if it can be done smarter (or is 
needed at all), or even if it can be done in code - OR - get rid of 
the 1NF schema and go with the suggested Bible-per-Table method and 
keep the triggers (This will help nothing if the deletes are going to 
require deleting 1 record (and its trigger) each through 30K tables. 
It's only useful if it brings down the delete count).


If you cannot live without the triggers, nor can bring down the 
delete-count, maybe do it in code, or one quick way I can think of is 
using the ON DELETE to add those IDs to a "to_be_deleted" table and 
simply delete everything fropm the second table with "WHERE ID IN 
(SELECT ID FROM to_be_deleted)" before comitting the transaction, 
perhaps combined with dropping and re-making the Indexes - but of 
course this needs testing and might not be suitable for reasons I am 
unaware of in your use case.



Either way, you are probably in for some lengthy processing.

Best of luck!
Ryan 


My thought is that a modification (making multiple tables, one per 
version) that drops significant functionality (now version is hard coded 
as part of the query, and can't be determined as part of a query) is 
something not to lightly do.  It also isn't that likely to improve 
things a lot unless it allows the massive delete cycle to be replaced 
with a drop table.


One big question is why the apparent wholesale deletion of a version, 
and if really needed perhaps a better schema that supports the 
operations better.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Richard Damon

On 2/23/20 8:31 AM, Olaf Schmidt wrote:

Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing.  > 
Reads are very fast, but deleting / inserting / updating
takes on average about 50 seconds per record. I often do batches of 
30,000 deletes / inserts at a time. The last batch took 10 hours to 
complete.


These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?

And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?

I suspect it’s got something to do with the triggers more than the 
indexes.

Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".

So why not handle (hold) each bible in a separate table?

An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to group this?

2020-02-10 Thread Richard Damon

On 2/9/20 11:44 PM, Rowan Worth wrote:

On Mon, 10 Feb 2020 at 11:12, Richard Damon 
wrote:


On 2/9/20 7:24 PM, Bart Smissaert wrote:

ID ENTRY_DATE TERM NUMERIC_VALUE ROWID

1308 15/Mar/2013 Systolic 127 701559
1308 15/Mar/2013 Diastolic 81 701568
1308 27/Jun/2013 Systolic 132 701562
1308 27/Jun/2013 Systolic 141 701563
1308 27/Jun/2013 Systolic 143 701564
1308 27/Jun/2013 Diastolic 82 701571
1308 27/Jun/2013 Diastolic 85 701572
1308 27/Jun/2013 Diastolic 94 701573
278975701 08/Mar/2018 Systolic 136 1583551
278975701 08/Mar/2018 Diastolic 99 1583591
278975701 04/Apr/2018 Systolic 119 1583552
278975701 04/Apr/2018 Systolic 124 1583553
278975701 04/Apr/2018 Systolic 130 1583554
278975701 04/Apr/2018 Diastolic 74 1583592
278975701 04/Apr/2018 Diastolic 75 1583593
278975701 04/Apr/2018 Diastolic 85 1583594

These are systolic and diastolic blood pressures for 2 people with the

ID's

1308 and 278975701, ordered by ID asc, ENTRY_DATE asc, ROWID asc.
Systolic and diastolic values are a pair and should be grouped in one

row.

This is no problem if there is only one pair for one date, but sometimes
there multiple pairs per date.
The pairing should be based on the rowed if there are multiple pairs by
date, so for ID 1308
I should get:

127/81
132/82
141/85
143/94

What should be the SQL to group like this?

RBS

To be honest, I think the problem is fundamentally badly designed. You
say pair the two readings by ROWID, but they of course don't have the
same ROWID, but you seem to be saying to pair them sorted by ROWID (1st
to 1st, 2nd to 2nd, etc). The fundamental problem is what if there isn't
the same number of each? You may say that you know that there will
always be the same number, but there is no constraint that forces this,
so any general program is going to have to deal with the possibility
(and at least throw out an error when it sees that).


Yeah, it would have been easier to group the readings at write time - eg.
via another column storing the time of day or the "nth reading of the day".
You could still add the latter, post-hoc.

Note that ROWID is not persistent -- see quirk #6 here:
https://www.sqlite.org/rowidtable.html

I would expect that VACUUM's renumbering happens to maintain the row order,
but I doubt it's guaranteed. If you do have an INTEGER PRIMARY KEY it's
better to refer to that directly.

-Rowan


I think Vacuum maintains Row order but not RowID, unless it has been 
aliased to a user defined column.


Yes, one solution would be to add another column that can be used either 
alone or paired with other columns to allow a join to find the pair of 
readings. So modified, the writing application could be updated to write 
the same value into that column for the two readings, and/or a fix-it 
app could be run to add it to existing data.


SQL itself is unlikely to have a simple way to do this, as the problem 
isn't solvable in general, which is what the SQL solution would be 
attempting, unless the SQL solution was basically doing the fixup 
operation, finding the duplicate ID/Date rows and adding the pairing value.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to group this?

2020-02-09 Thread Richard Damon

On 2/9/20 7:24 PM, Bart Smissaert wrote:

ID ENTRY_DATE TERM NUMERIC_VALUE ROWID

1308 15/Mar/2013 Systolic 127 701559
1308 15/Mar/2013 Diastolic 81 701568
1308 27/Jun/2013 Systolic 132 701562
1308 27/Jun/2013 Systolic 141 701563
1308 27/Jun/2013 Systolic 143 701564
1308 27/Jun/2013 Diastolic 82 701571
1308 27/Jun/2013 Diastolic 85 701572
1308 27/Jun/2013 Diastolic 94 701573
278975701 08/Mar/2018 Systolic 136 1583551
278975701 08/Mar/2018 Diastolic 99 1583591
278975701 04/Apr/2018 Systolic 119 1583552
278975701 04/Apr/2018 Systolic 124 1583553
278975701 04/Apr/2018 Systolic 130 1583554
278975701 04/Apr/2018 Diastolic 74 1583592
278975701 04/Apr/2018 Diastolic 75 1583593
278975701 04/Apr/2018 Diastolic 85 1583594

These are systolic and diastolic blood pressures for 2 people with the ID's
1308 and 278975701, ordered by ID asc, ENTRY_DATE asc, ROWID asc.
Systolic and diastolic values are a pair and should be grouped in one row.
This is no problem if there is only one pair for one date, but sometimes
there multiple pairs per date.
The pairing should be based on the rowed if there are multiple pairs by
date, so for ID 1308
I should get:

127/81
132/82
141/85
143/94

What should be the SQL to group like this?

RBS


To be honest, I think the problem is fundamentally badly designed. You 
say pair the two readings by ROWID, but they of course don't have the 
same ROWID, but you seem to be saying to pair them sorted by ROWID (1st 
to 1st, 2nd to 2nd, etc). The fundamental problem is what if there isn't 
the same number of each? You may say that you know that there will 
always be the same number, but there is no constraint that forces this, 
so any general program is going to have to deal with the possibility 
(and at least throw out an error when it sees that).



--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] loading extension csv.c

2020-02-07 Thread Richard Damon

On 2/7/20 3:14 PM, Jens Alfke wrote:



On Feb 7, 2020, at 9:11 AM, chiahui chen  wrote:

/usr/include/sqlite3ext.h:437:53: note: expanded from macro
'sqlite3_vsnprintf'

#define sqlite3_vsnprintf  sqlite3_api->vsnprintf

   ~~~  ^

/usr/include/secure/_stdio.h:75:3: note: expanded from macro 'vsnprintf'

  __builtin___vsnprintf_chk (str, len, 0, __darwin_obsz(str), format, ap)

  ^

This appears to be your problem. The system header  is 
defining `vsnprintf` as a macro that expands to a compiler builtin. This is 
conflicting with a struct field named `vsnprintf` in the SQLite extension API.

I've never heard of  before, although it does exist in the macOS SDK. 
Looking through the normal , it does include that header at the end:

#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions.  */
#include 
#endif

So it looks like the trigger is that you're somehow building with 
_FORTIFY_SOURCE defined, and the others are not.

Anyway, I think you could work around the problem by editing csv.c and 
inserting something like this at the top:
#include 
#undef vsnprintf
Or else figuring out how to turn off _FORTIFY_SOURCE.

—Jens

PS: Your use of `gcc` in the command line confused me briefly — turns out `gcc` 
on macOS is simply an alias for `cc`, so it invokes Clang. If you really want 
GCC for some reason you'd have to install it yourself and put it in your $PATH 
before /usr/bin.

It looks like that header (sys/_stdio.h) is non-conforming. The C 
Standard does allow the stdio.h header to define a macro for the name 
vsnprintf, but that macro must be a *function-like* macro (7.1.4p1 in 
the C17 Standard) which it appears not to be (as that shouldn't cause a 
problem with the shown code).


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is mutliple-execute-one-commit slower than multiple single-execute-single-commit?

2020-01-29 Thread Richard Damon

On 1/29/20 1:42 AM, Peng Yu wrote:

Hi,

I have two python programs using sqlite3. They function the same,
except the following.

In the first, execute() is called in batches and then commit() is
called following them. In the second, commit() is called after each
execute(). It seems that the second case is faster (I can not separate
my code in a self-contained test case to show here).

This is counterintuitive. I thought the first should be faster.

Is it expected that the 2nd case should be slightly faster?

One thought is that if the execute creates a lot of data for the 
database, then multiple executes could exceed the memory cache, causing 
it to spill to the database, and then the commit needs to read that back 
and put it into the right place, while a commit after each execute keeps 
everything in memory until the commit writes it to the database.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Find schema of a table in a query

2020-01-20 Thread Richard Damon

On 1/20/20 4:17 AM, Keith Medcalf wrote:

The "main" database is always seq == 0, the "temp" database is always seq == 1, 
and other databases are seq == 2 and greater in the order they were attached.  seq 2 -> whatever is 
always contiguous.  The table search order for unqualified names (when a search is required) is always 
in the seq returned by pragma database_list, so

select name
   from pragma_database_list as d
  where exists (select *
  from pragma_table_info
 where schema == d.name
   and arg == 'x')
  limit 1;

will always return the correct schema name, no order by required.  (and you can 
use table_info rather than table_xinfo because a table must always have one 
named column that is not hidden.  If this were not the case, then you would 
have to use table_xinfo to ensure that something is returned for that pragma 
lookup.

One point, due to the nature of the fundamentals of SQL, the whole 
concept of 'first' means you MUST have an ORDER BY to get the item you 
want.  Yes, when you test it may seem that the records always come in 
the order you want without it, but that is really happenstance, and 
might change by any number of things.


If the order that the SQL engine happens to choose to fetch the data 
does happen to be in the order specified by the ORDER BY, then a good 
engine will optimize it out, so it is free.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How can I detect rows with non-ASCII values?

2020-01-18 Thread Richard Damon

On 1/18/20 3:21 AM, Rocky Ji wrote:

Hi,

I am asked to highlight rows containing strange characters. All data were
ingested by a proprietary crawler.

By strange, I mean, question marks, boxes, little Christmas Trees,  solid
arrows, etc. kind of symbols; these appear suddenly in flow of normal ASCII
English letters.

How do I approach this?

Thanks.


The first thing that you are going to need to find to do this is how the 
data has actually been stored. The strange characters you are describing 
sound like somewhere some data obtained in one character encoding, but 
then interpreted as another.  This is a very old problem, without a 
solution in general except to always know the encoding of your data. It 
goes back to the development of the ASCII character set which is the 
base for most character sets in use today (another major branch is 
EBCDIC, but that has its own issues and you tend to know when you need 
to deal with that).


The A in ASCII is for American, the ASCII character set was designed for 
American (i.e. english) data, and when it when it was developed, memory 
and bandwidth were limited and expensive, so you didn't waste them. 
ASCII was a compact set, using only 7 bits per character, and was fine 
for english information, with the basic alphabet. It became the base 
standard because America was a core part in the initial computer 
development, and had a lot of influence in the initial standards.


While it worked well for American data, it didn't work so well for many 
other countries, so most other countries adopted their own character set 
for use in their country, normally based on ASCII as a base, but adding 
codes to extend the coding to an 8 bit code, keep (at least most of) 
ASCII as the first 128 values.


To exchange data between character machines, you needed to include what 
character set the data was in (or you see some funny words due to the 
mis-match).


Operating systems adopted the concept of Code Pages, which basically 
defined which of the many standard character sets was to be used, and 
some transfer formats actually included as part of the header 
information what character set the data that follows was in. One of 
these was the web pages that were on the Internet.


Later, to try and get out of this mess, a new character encoding was 
invented called Unicode, Unicode initially intended to provide a single 
universal encoding that would let any of the many standard encodings be 
converted to this universal encoding. It was first thought that this 
could be done with a 16 bit character set, but it later needed to be 
enlarged in size as they found out how many different characters there 
really were. While memory isn't quite as precious as it was when ASCII 
was designed, it isn't so abundant that we could just increase the size 
of out data by a factor, so a compact encoding was developed called 
UTF-8, which represents the ASCII characters exactly as the ASCII 
character set, and all the extra characters with multiple bytes.


Because it did make files with the extra characters longer, and it was 
somewhat complicated to work with, and most people only worked with 
documents that could all be encoded in a single character set, its 
adoption was slow, but it now is becoming a norm, but still many things 
are in the old legacy encodings.


If you try to interpret a file that is in one of the legacy encodings as 
Unicode UTF-8, then (if it uses extended characters) it almost certainly 
will create decoding errors (UTF-8 was intentionally designed with 
redundancy in the encoding to easy processing, so many 'random' 
sequences become invalid). If you interpret a file that is in UTF-8 and 
a legacy encoding, you will tend to get some strange out of place 
extended characters.


My first guess is that your proprietary crawler either didn't properly 
detect the page encoding and handle it, ideally it would have converted 
it to UTF-8, but in might also save the data in the original encoding 
and saved what that encoding was, or your read out program isn't 
detecting the character set the data was stored as and processing it 
right. I believe SQLite assumes that 'TEXT' data is UTF-8 encoded, but 
other encodings can be declared or data stored as BLOBs.


What likely should happen is someone (maybe you) needs to read out 
samples of the funny data as a blob, and figure out how the data is 
actually encoded, ideally comparing it to the original page crawled, and 
once you know what the problem was, you can perhaps work on fixing it 
and detecting the records with problems.


One possible issue is that some conversion routines take characters they 
don't know how to handle and replace them with the ASCII Question Mark, 
and if that is what has been stored in the database, it may be very hard 
to distinguish that from an actual question mark in the data.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users

Re: [sqlite] Unexplained table bloat

2020-01-13 Thread Richard Damon

On 1/13/20 5:24 AM, Dominique Devienne wrote:

On Mon, Jan 13, 2020 at 11:07 AM Keith Medcalf  wrote:

On Monday, 13 January, 2020 02:27, Dominique Devienne  
wrote:

I'd vote for a lengthof(col) that's always O(1) for both text and blob

So what should lengthof(something) return the number of bytes in the 'database 
encoding' or something else?

Bytes of course. Of the data stored, i.e. excluding the header byte
and encoded size (if any) from the file-format.
Basically the same as length() *except* for text values, resulting in
O(1) behavior. --DD

PS: I keep forgetting length(text_val) returns the number of
code-points in fact :)
PPS: Surrogate pairs count as one or two code points? That's just
bait, I don't really want to know :)))
Re the PPS, UTF-8 isn't allowed to have Surrogate Pairs. Non-BMP 
characters which would use Surrogate Pairs in UTF-16 are supposed to be 
converted to their fundamental 21 bit value and that encoded into UTF-8. 
If the code doesn't validate the data well enough to catch that issue, 
then I suspect the character counting would count each half of the 
surrogate pairs as a code-point,


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexplained table bloat

2020-01-12 Thread Richard Damon

On 1/12/20 5:25 PM, Tom Browder wrote:

On Sun, Jan 12, 2020 at 14:05 Keith Medcalf  wrote:


On Sunday, 12 January, 2020 09:03, Tom Browder 
wrote:

Am I missing something? I thought every column has to have a type?

Close, but no banana.  Every value has a type.  A column may contain
multiple values (as in one per row)


Thanks, Keith.

I assume that is just for SQLite, or am I wrong again?

-Tom
That the entries for a given column in different rows can have different 
types is a peculiarity of SQLite. In a 'Standard' SQL database, a column 
has a defined type, and all rows will have values of that type (or NULL).


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexplained table bloat

2020-01-10 Thread Richard Damon

On 1/10/20 2:24 PM, Tim Streater wrote:

On 10 Jan 2020, at 18:55, Keith Medcalf  wrote:


On Friday, 10 January, 2020 11:44, Tim Streater  wrote:


On 10 Jan 2020, at 18:03, Richard Hipp  wrote:

On 1/10/20, Dominique Devienne  wrote:

There's no way at all, to know the length of a text column with
embedded NULLs?

You can find the true length of a string in bytes from C-code using
the sqlite3_column_bytes() interface. But I cannot, off-hand, think
of a way to do that from SQL.

But if I store UTF-8 in a TEXT column, surely I'm allowed to include
NULLs in that? They are after all valid UTF-8 characters.

No, they are not. The "NUL character" in Modified UTF-8 is the two-byte
sequence 0xC0 0x80. This is specifically so that 0x00 can be used as a string
terminator. Validly encoded UTF-8 encoded text stored in a C String (0x00
terminated sequence of bytes) must not contain an embedded 0x00 byte since
that byte terminates the sequence.

Nice, but Wikipedia has that as a "Derivative" and "incompatible with the UTF-8 
specification and may be rejected by conforming UTF-8 applications." It appears (though I may 
have missed it) not to be mentioned on this handy site either:

https://www.utf8-chartable.de/unicode-utf8-table.pl

I shall have to check what my preferred language's wrapper does.


It is incompatible, in the sense that it uses an encoding that the UTF-8 
specification says in invalid, and thus an application that performs 
fully all the tests on valid data forms would reject it. In many ways it 
is a compatible extension in that excluding the test that specifically 
makes the form invalid, doing the processing by the general rules of 
UTF-8, gives the expected result.


C Strings do not allow 0 bytes in them. This would normally mean that 
they do not allow the NUL character to be in a string. This extension 
allows a character which would be interpreted as the NUL character to be 
represented without needing a 0 byte.


It should be pointed out that most libraries won't be checking all the 
strings that pass through them to see if they violate the rule, as that 
is just adding a lot of overhead for very little benefit. It is really 
expected that applications will do this sort of test at the borders, 
when possibly untrusted strings come in, and know that if good strings 
come in, the following processing will keep the strings valid.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexplained table bloat

2020-01-10 Thread Richard Damon

On 1/10/20 1:43 PM, Tim Streater wrote:

On 10 Jan 2020, at 18:03, Richard Hipp  wrote:


On 1/10/20, Dominique Devienne  wrote:

There's no way at all, to know the length of a text column with embedded
NULLs?


You can find the true length of a string in bytes from C-code using
the sqlite3_column_bytes() interface.  But I cannot, off-hand, think
of a way to do that from SQL.

But if I store UTF-8 in a TEXT column, surely I'm allowed to include NULLs in 
that? They are after all valid UTF-8 characters.



As has been said, C Strings (which is what TEXT is assumed to hold) are 
not allowed to include null characters, but are assumed to terminate at 
the first 0 bytes. There is a variant of UTF-8, called modified UTF-8 or 
MUTF-8, which allows a null character to be encoded as C0 80, which does 
decode to 0 by the base UTF-8 rules, but is disallowed by the minimum 
encoding rule, which can be used to embed nulls in strings if the system 
doesn't enforce the minimum length encoding rule (at least for this 
character).


I have no idea if that would work with SQLite though.

--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Causal profiling

2019-12-31 Thread Richard Damon

On 12/31/19 3:13 AM, Jens Alfke wrote:

On Dec 30, 2019, at 6:23 PM, Alexander Vega  wrote:

-fdevirtualize
-fdevirtualize-speculatively

I believe those are C++ optimizations, specific to C++ virtual methods. What 
SQLite is doing is similar to a class with virtual methods, but done “by hand” 
in plain C, so the optimizer doesn’t know how to apply that optimization.

(I’m fighting the urge to write an off-topic rant about why anyone still 
programs in C when they could write roughly the same code much more simply and 
cleanly in C++ using only the most basic features like classes and methods. 
I’ve spent too much time lately trying to figure out or debug hellacious C 
spaghetti code that’s awkwardly reinventing wheels like inheritance and 
constructors/destructors.)

And in any case, de-virtualization only works when a virtual method has only 
one implementation, I.e. is monomorphic, and that isn’t true here.

—Jens


I will give a short answer, in my opinion, of a few reasons on why C is 
still a viable language.


1) The C API between separate compilations is very well established, and 
fully documented in most environments. C++ is less so, where sometimes 
important details like how to mangle names and occasionally handle 
exceptions not fully standardized between compilers on a given platform.


2) The way language support packages work, it is fully defined to have a 
C library in a C++ program, but it might not work to have a C++ library 
in a C program, you need to have a C++ compatible startup package (to 
run the constructors etc.). Changing a library from C to C++ is thus a 
backwards breaking change unless EVERYONE has migrated to C++.


3) The C language is much more stable. While some new features have been 
added to C in the last decade, the language has been very stable (yes, 
some legacy features that were included in C90 were later removed but 
that was for things that weren't recommended anyway). It is quite 
possible to have a code base designed to build on a C99 implementation 
(or even C90, perhaps with a few replacement headers like stdint.h for 
some very useful additions from C99) and have it still work with a 
totally up to date system. The C++ language has been much more fluid in 
language definition.


4) There still exists some embedded system that don't have a free C++ 
compiler, so such a change restricts it from some environment 
(admittedly, many of those may not have the resources to use SQLite well).


As an aside for that, SQLites 'virtual' functions aren't implementing 
something easy to do in C++ (except by doing the same thing), as to do 
what is being done you would have to change the 'type' of the SQLite 
'object'.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 18 minutes 41 seconds

2019-12-30 Thread Richard Damon

On 12/30/19 10:10 PM, Pierpaolo Bernardi wrote:

On Tue, Dec 31, 2019 at 4:07 AM Keith Medcalf  wrote:


On Monday, 30 December, 2019 19:29, Michael Falconer 
 wrote:


As we approach the end of yet another year ( and indeed decade ).

Technically, every year is the end of a decade, if one means the immediately 
preceding ten years.

However, if you mean the end of the second decade of the 21st century, you will 
have to wait another year for that.  January 1st, 0001 AD was the first day of 
the year 1.  The first decade ended at the end of December 31st 0011 AD, not 
December 31st, 0010 AD. (if following the proleptic Gregorian calendar).

Languages don't work like this.

https://www.collinsdictionary.com/dictionary/english/decade

Cheers


Its a difference between ordinals and numerals. The 20th century was 
from the beginning of 1901 to the end of 2000. We also have the century 
called the 1900's which went from 1900 to the end of 1999.


Decade would work the same way, the 202st decade goes from 2011 to end 
of 2020, but the 2010s go from 2010 to end of 2019.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Causal profiling

2019-12-30 Thread Richard Damon

On 12/30/19 10:19 AM, Doug wrote:

OK, I get that the definition of the SQLite API is a (large) set of C function 
calls. And that changing the way they work under the covers would break 
something. (Check out the IDEA below!)

I'm wondering if your use case is - in the same application - that you 
dynamically change from option SQLITE_CONFIG_SINGLETHREAD to 
SQLITE_CONFIG_MULTITHREAD to SQLITE_CONFIG_SERIALIZED while your application is 
running? If that is the case, then your application knows which option to use 
dynamically; Otherwise, your application doesn't know and doesn't care which 
option is in effect.

I am suggesting that if the we added the global calls to the underlying 
functions to the API - that is, the functions that are called by the function 
table indirection - then one could code the application to call the underlying 
functions. If the application knows it's single-thread, then code it that way 
and get a 25% improvement (see the talk). If the application makes the choice 
of thread option dynamically, then the penalty for single-thread is at least 
double (application choice, SQLite indirection), so calling the (new) 
underlying function once you made the choice, performs better for that path. I 
grant you that probably will see little improvement on the threaded path.

If you are going to tell me that you need to maintain two versions of your 
application if you run it in a single-thread environment or a multi-thread 
environment, then let's define the (new) API to use a preprocessor macro to 
generate the right code for the option selected based on the values of 
SQLITE_CONFIG_SINGLETHREAD, el al. So now you have a single source but multiple 
executables corresponding to that source. And the choice of which executable to 
use becomes a configuration problem at application deployment time, or at 
application run time.

 IDEA! 
Thinking about it, I'm surprised that the C API isn't just a set of macros 
already. I can visualize a  C API composed of a set of macro definitions 
_identical_ to the current C function calls. They just use the extra knowledge 
of SQLITE_THREADSAFE and other SQLite compiler options to decide what 
application code to generate. Then the formal API doesn't change from a coding 
point of view. The generated code calls a set of under-the-cover functions 
which are not part of the API. The change doesn't require a new layer of 
testing; presumably, we already have test cases that test the same code using 
different compiler options. What about that?

Best, Doug

One BIG reason the C API can't be made into a set of macros is that the 
C API is basically the multi-language API for using the SQLite shared 
library in other languages. This requires that the C API functions be 
REAL functions that provide entry points into the shared library. It 
also, perhaps as a much more minor point, prevents taking the address of 
those functions to use in the application either to pass SQLite routines 
as call backs or make build your own virtual functions (admittedly, I 
don't know of a case where you would really want to do that).


I suspect that by far the vast majority of SQLite uses don't bundle the 
SQLite source code into the project, but link to it as an external resource.


Yes, there is perhaps an option to provide some specific configuration 
macros to allow SQLite to be optimized when included statically in a 
project, but those options shouldn't change the API.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to determine the column type? – virtual table?

2019-12-14 Thread Richard Damon
On 12/14/19 1:55 PM, František Kučera wrote:
> Dne 14. 12. 19 v 18:20 Richard Damon napsal(a):
>> What he wants is different. He takes a basically arbitrary database
>> (user provided) and an arbitrary SQL statement (also user provided) and
>> he wants to determine what type a given column will present.
> Yes, I am looking for something that is usually called ResultSetMetaData.
>
> Currently I have solved it by adding a new command line option, so if the 
> user wants integer in e.g. "size" column, he must say it explicitly by: 
> --type-cast "size" integer.
>
> I can imagine a module, that will introduce optional static typing to SQLite. 
> It could provide a virtual table that will parse the query and return 
> expected result set metadata. So before executing the query, I could do 
> SELECT order, column_name, column_type FROM result_set_metadata WHERE sql = 
> 'SELECT mount_point, pass+1000 AS pass FROM fstab'; and it would return:
>
> result_set_metadata:
>  ╭─┬──┬──╮
>  │ order (integer) │ column_name (string) │ column_type (string) │
>  ├─┼──┼──┤
>  │   1 │ mount_point  │ string   │
>  │   2 │ pass │ integer  │
>  ╰─┴──┴──╯
> Record count: 2
>
> to do this, it would have to:
>
>  - parse the SQL (I am not sure whether internal SQLite parser could be used 
> for it or if I had to do it myself)
>
>  - check whether requested tables and columns exist and check the declared 
> types
>
>  - analyze the operations done in the SELECT clause (operators, function 
> calls or CASE) and derive resulting type
>
>  - return more generic type if types varies e.g. CASE that returns integer or 
> decimal numbers will result in a decimal type
>
>  - return an error if the CASE mixes e.g. integers and strings which does not 
> make much sense and is probably a mistake
>
> expectations and limitations:
>
>  - the table contains only values of declared types
>
>  - null is not perceived as a type itself but rather as a value of another 
> type (missing integer, missing string etc.)
>
> This is IMHO feasible, but would mean probably a lot of work. However it 
> might be beneficial for those who come from other relational databases and 
> prefer static typing rather than dynamic.
>
> i.e. something like optional type hints and checks in dynamic programming 
> languages.
>
> Franta

I think the biggest part of the issue is that you are thinking in
'Standard SQL' and then giving the user free reign in what SQL they are
going to use (so they COULD use some of the relaxation of limitations
provided by SQLite).

One big issue with your proposed module is that it (or at least programs
using it) are going to want to assume its assumptions, but there is not
an easy way to enforce them, as a program that doesn't use the module
could access the data base and break them.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to determine the column type? – explicit cast

2019-12-14 Thread Richard Damon
On 12/14/19 11:12 AM, Simon Slavin wrote:
> On 14 Dec 2019, at 10:46am, František Kučera  wrote:
>
>> SELECT cast(dump+100 AS integer) FROM fstab;
>>
>> the sqlite3_column_decltype() still does not return the integer type.
>>
>> Would it be possible to modify this function or add a new one, to tell the 
>> correct type at least if there is an explicit cast like this in given query?
> It works fine for me:
>
> SQLite version 3.28.0 2019-04-15 14:49:49
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> CREATE TABLE t (a TEXT, b INTEGER, c REAL);
> sqlite> INSERT INTO t VALUES ('123', 123, 123.4);
> sqlite> INSERT INTO t VALUES (CAST ('456' AS INTEGER), CAST (456 AS INTEGER), 
> CAST (456.7 AS INTEGER));
> sqlite> SELECT a,typeof(a),b,typeof(b),c,typeof(c) FROM t;
> 123|text|123|integer|123.4|real
> 456|text|456|integer|456.0|real
> sqlite> SELECT cast(a AS INTEGER),typeof(cast(a AS INTEGER)),cast(b AS 
> INTEGER),typeof(cast(b AS INTEGER)),cast(c AS INTEGER),typeof(cast(c AS 
> INTEGER)) FROM t;
> 123|integer|123|integer|123|integer
> 456|integer|456|integer|456|integer
>
> When you do your CAST when you store, and the column type is compatible with 
> the input value, the column type is what you declared the column type to be.
>
> But whether you do your cast() when you recall, the column type is always 
> INTEGER.  Which is what you want.

What he wants is different. He takes a basically arbitrary database
(user provided) and an arbitrary SQL statement (also user provided) and
he wants to determine what type a given column will present.

He is willing to assume that columns hold their declared data type
(otherwise the problem is truly impossible), and if the results goes
back to just a column, the answer is simple, the declared type of the
column (even though in many cases, the answer could also be NULL). The
problem is that there is no built in algebra to deduce what type an
expression will produce (assuming you know the types of the inputs), in
part because in SQLite you can't 'know' the type that the input would
be, and in part because sometimes result type will depend on the values
received.

The only answers I can think of are

1) retrieve all the results, taking whatever type SQLite says that value
is, and then process all the results for a give column to figure out
what (and if) that column produces (if because it might be inconsistent,
so you need to respond 'mixed' and maybe handle mixed INTEGER and REAL
some way).

2) Parse the expression yourself and determine the types (and know ahead
of time if there are problem columns). There might be ways to use some
internals of SQLite to help, but SQLite isn't going to do the job
itself, as it has no need for that answer (as it is based on assumptions
that SQLite doesn't make on the data).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to determine the column type?

2019-12-14 Thread Richard Damon
On 12/14/19 5:22 AM, František Kučera wrote:
> Dne 14. 12. 19 v 9:36 Darren Duncan napsal(a):
>> With respect to SQLite, every column is the union of: Null, every Integer, 
>> every Float, every Text, every Blob.
> OK, we can close this thread with that it is currently impossible to track 
> the declared column types through relational operations.
>
It is currently impossible because it is actually impossible to know the
result type of some expressions except by knowing the values and
computing it. Some arithmetic operations have thier result type varied
based on the value of the result (overflow goes to REAL), a CASE
expression can return different types based on the value of an
expression, and then you have the fact that NULL has a unique type.

A common issue with computer systems is that when you have something
that looks easy to do for a lot of simple cases, but there actually are
some cases that are hard or impossible to determine, then those hard
cases make it hard to handle the general problem.

Thus even without using SQLite's flexibility it types stored in columns,
you can't predetermine the type of some expressions.

-- 

Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to determine the column type?

2019-12-13 Thread Richard Damon
On 12/13/19 7:16 PM, František Kučera wrote:
> Dne 14. 12. 19 v 0:09 Keith Medcalf napsal(a):
>> On Friday, 13 December, 2019 15:49, František Kučera 
>>  wrote:
>>
>>> I know that SQLite uses dynamic types, so it is not easy… But what is the
>>> best way to determine the column type of a result set?
>> Result sets do not have "column types".  Each result value (the intersection 
>> of row and column) has a type associated with it.
> I know that SQLite is that flexible, but I am using just a subset of its 
> features and looking for a way how to propagate the types through the queries 
> and result sets.
>
>> Your expectation would be incorrect. You have to call sqlite3_column_type 
>> for each column of each row in order to determine the type of data contained 
>> there. Every intersection of row and column can contain data of any type. 
>> Think of an Excel (or VisiCalc) spreadsheet. Just because you labeled a 
>> column as containing only integers does not mean that someone did not put 
>> something else there.
> Yes, I can do:
>
> sqlite> create table t (i integer, f float, s text);
> sqlite> insert into t values (1, 1.0, 'abc');
> sqlite> insert into t values (1.5, 1.0, 'abc');
> sqlite> insert into t values (1.5, 'xxx', 'abc');
> sqlite> select * from t;
> 1|1.0|abc
> 1.5|1.0|abc
> 1.5|xxx|abc
> sqlite> select typeof(i), typeof(f), typeof(s) from t;
> integer|real|text
> real|real|text
> real|text|text
>
> but this would be unsupported scenario and the user will expect, that if he 
> declares an integer column, he could put only integers into it. If the types 
> become into such inconsistent state, then my software could crash, throw 
> exception, refuse working… it is OK (user can recover his data directly 
> through SQLite, but it is not expected that this would happen).
>
> In case of my software I can really expect that all values in a column will 
> have the same type or be NULL (and everything else means error).
>
> So if I have a table with an integer column and I do e.g. a +1 operation on 
> it, the database has theoretically everything it needs to say that the 
> resulting type will be also integer. Or can the "+" operation result anything 
> than number (or null or error)?
>
> Would not it be useful to have optional function to determine the types 
> before executing the query? (yes, it would require that the columns contains 
> only values of declared type… but it is quite obvious and who call such 
> function will expect such constraint… and it might also require adding some 
> optional metadata to existing functions – hint what types they return…).
>
> Franta

As I said in my reply, you can keep asking for the type of columns that
have returned NULL previously to get the type.

Part of the issue is that you ARE allowing multiple types (since NULL is
its own type), so you need to be prepared for differing types.

One big thing to watch out is that columns of NUMERIC type can easily
return values of either INTEGER or REAL type. Your single type
expectation is easily broken here. I also don't know if
9223372036854775807 (the biggest integer value) from an INTEGER field +
1 gives a REAL result, or some incorrect INTEGER value.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to determine the column type?

2019-12-13 Thread Richard Damon
On 12/13/19 5:49 PM, František Kučera wrote:
> Hello,
>
> I know that SQLite uses dynamic types, so it is not easy… But what is the 
> best way to determine the column type of a result set?
>
> The sqlite3_column_decltype() works only if I select directly a column, but 
> not when I do some other operations (call function, increment etc.).
>
> The sqlite3_column_type() works for while iterating over particular rows. I 
> can fetch the first row and get type here (expecting that all values in that 
> column will have same type), but the problem is a) if the first value is NULL 
> or b) if the result set is empty.
>
> If I have e.g. "SELECT a + 1 AS x FROM t", is there a way to say that the x 
> will be numeric? Or if I have "SELECT 1+1 AS x"?
>
> I am writing a generic software that should work with any database model 
> (tables are defined by the user) and I need to know the types, so I can 
> properly present the results. Currently I use sqlite3_column_decltype() and 
> will add options so the user could explicitly specify types of particular 
> columns, but I am looking for a better way…
>
> Or will SQLite4 work differently (for me better) with types?
>
> Thanks,
>
> Franta
>
My guess is that your statement '(expecting that all values in that
column will have same type)' is where your problems arise. The SQLite
model doesn't assume that. A column in a result set doesn't necessarily
have *A* type. What do you want you software to do if the data in the
database has differing types in a given column?

If you are going to enforce a uniform type (excepting allowing NULL as a
value), then you could check your recorded column type for each column
for each row, and if you have it currently recorded as NULL, check the
type in this row and update if needed, otherwise use the recorded type.
Note that you need to be prepared for different queries of the same set
of columns (or the same query at different times) may give you changing
types for a given column at different times.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SLOW execution: Simple Query Uses More than 1 min

2019-12-09 Thread Richard Damon
On 12/9/19 4:25 PM, Keith Medcalf wrote:
>> You could still have fast forking without overcommitting, you’d just pay
>> the cost in unreachable RAM.
>>
>> If I have 4 GB of RAM in the system, and the kernel takes 1 GB of that, I
>> start a 2.5 GB user space process, and my process forks itself with the
>> intent of starting an 0.1 GB process, that fork would have to fail if
>> overcommitting weren’t allowed.
> No, it wouldn't, and there is no overcommitment.  You are creating a second 
> process that is using the same V:R mapping as the original process thus it is 
> consuming no more virtual memory after the fork operation than before (except 
> for the bytes to track the new process).  You now have two execution paths 
> through the same mapping which may require more real memory working set, but 
> you have not increased the virtual memory size.  That is until one of the 
> processes modifies a memory page in which case an additional virtual page 
> must be allocated to hold the modified page.
>
> Overcommittment occurs at the R level, not at the second V in the V:V:R 
> mapping.
>
> This is why shared libraries (and discontiguous saved segments) were 
> invented.  It permits the per-process mapping (the first V in V:V:R) to use 
> already existing virtual pages (the second V in V:V:R) without increasing the 
> count of Virtual Pages.  It is not overcommittment unless the number of 
> virtual pages (the second V in V:V:R) exceeds the number of pages in R plus 
> backing store.
>
Delaying the conversion of shared to distinct (or at least delaying the
reservation of backing store) is one form of over-committing. If we
assume that over-committing has been removed, then the fact that the
fork succeeded is the promise that both processes have the right to
access all of their address space. Any page that is writable needs to
have swap space reserved, or you have allowed over committing. The OS
can delay actually creating the new pages, and thus save some work, but
if you haven't reserved the space for the virtual page, you are allowing
an over commit.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SLOW execution: Simple Query Uses More than 1 min

2019-12-09 Thread Richard Damon
But without virtual memory, many applications combinations that work acceptably 
now would just fail to run at all. Virtual memory itself isn’t the issue. Also, 
an OS could fairly easily be set up so an application that start to thrash its 
virtual memory is dropped in priority to get memory, and even getting pages 
swapped in, so that other applications have their operations only minimally 
impacted.

One of the issues is due to the Linux fork/exec model. If a process wants to 
start a process that runs in parallel to it, the Linux system to my 
understanding doesn’t have an easy call to just start up a brand new process 
with parameters from you, but instead a process will fork itself, creating two 
identical copies of itself, one will continue, and the other will exec the new 
process, replacing itself with the desired process. The act of forking SHOULD 
allocated all the virtual memory for the copy of the process, but that will 
take a bit of time. Because most of the time, all that memory is just going to 
be released in a couple of instructions, it made sense to just postpone the 
actual allocation until it was actually used (which it likely wasn’t). This 
‘optimization’ was so ‘complete’ that the system didn’t really keep track of 
how much memory had been promised to the various processes, so the system 
allowed itself to overcommit memory, and if it actually did run out, it didn’t 
have a good way to determine who was at fault, and no way to tell them that the 
memory that was promised prior to them isn’t really available.

Fixing the issue is more of a political problem. With the current system, when 
a problem arises, you can normally find a user program or something the user 
did that was ‘bad’ and can be blamed for the problem. If the system was changed 
to not allow over committing, then forking would be slower which hits all of 
the standard system routines.  

> On Dec 9, 2019, at 8:39 AM, Digital Dog  wrote:
> 
> For reasons which you've described I'm a big fan of removing virtual memory
> from CPUs altogether. That would speed up things considerably.
> 
>> On Sun, Dec 8, 2019 at 6:43 PM James K. Lowden 
>> wrote:
>> 
>> On Sat, 7 Dec 2019 05:23:15 +
>> Simon Slavin  wrote:
>> 
>>> (Your operating system is allowed to do this.  Checking how much
>>> memory is available for every malloc takes too much time.)
>> 
>> Not really.  Consider that many (all?) operating systems before Linux
>> that supported dynamic memory returned an error if the requested amount
>> couldn't be supplied.  Some of those machines had 0.1% of the
>> processing capacity, and yet managed to answer the question reasonably
>> quickly.
>> 
>> The origin of oversubscribed memory rather has its origins in the
>> changed ratio of the speed of RAM to the speed of I/O, and the price of
>> RAM.
>> 
>> As RAM prices dropped, our machines got more RAM and the bigger
>> applications that RAM supported.  As memory got faster, relatively, the
>> disk (ipso facto) has gotten slower. Virtual memory -- the hallmark of
>> the the VAX, 4 decades ago -- has become infeasibly slow both because
>> the disk is relatively slower than it was, and because more is being
>> demanded of it to support today's big-memory applications.  Swapping in
>> Firefox, at 1 GB of memory, who knows why, is a much bigger deal than
>> Eight Megabytes and Constantly Swapping.
>> 
>> If too much paging makes the machine too slow (however measured) one
>> solution is less paging.  One administrative lever is to constrain how
>> much paging is possible by limiting the paging resource: swap space.
>> However, limiting swap space may leave the machine underutilized,
>> because many applications allocate memory they never use.
>> 
>> Rather than prefer applications that use resources rationally or
>> administer machines to prevent thrashing, the best-effort, least-effort
>> answer was lazy allocation, and its infamous gap-toothed cousin, the
>> OOM.
>> 
>> Nothing technical mandates oversubscribed memory.  The problem, as
>> ever, is not with the stars, but with ourselves.
>> 
>> --jkl
>> 
>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] built-in printf() not supporting \n or \t ?

2019-11-28 Thread Richard Damon
On 11/28/19 10:24 AM, Dominique Devienne wrote:
> Obviously it's ugly to use concatenation and char() to format a string
> literal with tabs and newlines.
> Is there a better way? Why doesn't printf() support newlines and tabs like
> it's C cousin? --DD
>
> PS: Built-in printf() also doesn't support positional params, to "emulate"
> newline with printf( '%1$s1: %2$s%1$s2: %3$s%1$s' , char(10), 'one',
> 'two'), but that's not too readable either, in any case
>
> sqlite> select printf('\n1: %s\n2: %s\n', 'one', 'two');
> \n1: one\n2: two\n
> sqlite> select char(10)||printf('1: %s', 'one')||char(10)||printf('2: %s',
> 'two')||char(10);
>
> 1: one
> 2: two
>
> sqlite> select printf('\t1: %s\t2: %s\t', 'one', 'two');
> \t1: one\t2: two\t
> sqlite> select char(9)||printf('1: %s', 'one')||char(9)||printf('2: %s',
> 'two')||char(9);
> 1: one  2: two
> sqlite>

A couple of things to note:

\n and \t are not 'printf' features, but C string features, that \ is an
escape introducer for compiling a string, and if followed by a letter
like n or t it builds a string with the special value represented by
that function. The \n does NOT make it into the string itself, only the
code for a newline. You are not typing your line into a C compiler, but
the sqlite shell program, so it build strings differently. 

ISO C also does not support positional arguments, that is an extension
that many compiler provide as I believe it is a common extension used in
linux.

You also don't NEED positional arguements, you could use

printf('%s1: %s%s2: %s', char(10), 'one', char(10), 'two')


There is a discusson of why SQLite uses its own printf here:
https://sqlite.org/printf.html

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding the whole C89/C90 language compliance debacle ...

2019-11-23 Thread Richard Damon
On 11/23/19 6:35 PM, Simon Slavin wrote:
> On 23 Nov 2019, at 11:06pm, Richard Hipp  wrote:
>
>> given the choice between
>>
>> (1) Code that works and does something useful
>> (2) Code that is standards compliant
>>
>> I'll always go with (1).
> Another problem is that different compilers, or the same compiler with 
> different options, warn about different things.  And that making changes to 
> make one compiler happy can make another compiler unhappy.  Until you end up 
> with
>
> complicated line here;/* actually does a = b but
>   must keep four compilers happy */

I consider it an error in specifications if there is a requirement for
no warnings without specifying what compiler/options that requirement
needs to be meet on. It can be shown to be impossible to get totally
warning free code on arbitrary compilers.

As to the original question, if the program uses the type long long,
then it can't be C89/C90 compatible, as the type didn't exist then. It
might be that the program given some option could make a variation that
doesn't use long long for very old systems.

As to the conversion between function pointers and data pointers, this
ability is not specified in the C Standard, and on some machines it
actually can't be done. POSIX compatibility requires that it works in at
least some limited situation as an extension. (Originally, there was
some confusion over if it required a diagnostic in fully conforming mode
or not, of course enabling POSIX would suppress that message).

Sometimes code is just easier to write (or performs better) if you use
extensions provided by an implementation. Doing so limits what compilers
can be used, but some GCC extensions are just so commonly available on a
variety of platforms that the limitation is minor.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Richard Damon
On 11/20/19 1:26 PM, Simon Slavin wrote:
> On 20 Nov 2019, at 6:11pm, Andy Bennett  wrote:
>
>> In past attempts at improving query performance these have been added to 
>> encourage it to use an index that it can do a SCAN thru' rather than the 
>> table that it would need to do a SEARCH thru'.
> SQLite is not using the PRIMARY INDEX to immediately locate the appropriate 
> row, but is actually faster when you fake it into using a longer index ?  
> That's weird.

I think the issue is that the 'Primary Index' isn't really the primary index, 
but that the implied ROWID (since the table isn't WITHOUT ROWID, and the 
Primary Key isn't INTEGER PRIMARY KEY) will be the primary key. Thus a lookup 
of a row with his declared primary key is a two step lookup, find the key in 
the Unique index for the key, and then lookup the specified record by ROWID, if 
there is a index with the needed data, then the second lookup isn't needed.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] mport SQLite import Latin1 data?

2019-11-15 Thread Richard Damon
On 11/15/19 4:45 PM, Winfried wrote:
> It's odd that SQLite doesn't support this, since it's not uncommun to have
> big Latin1 input files that can take a while to load in eg. Excel.
>
> For others' benefit, GNUWin32's iconv is very fast:
>
> "C:\Program Files\GnuWin32\bin\iconv.exe" -f "windows-1252" -t "UTF-8"
> "input.1252txt" > "output.UTF8.txt"
>
> Thank you.
>
I think it is because SQLite expect the application to handle any such
needed conversion.

Note that .import isn't an SQL command, but a command line in the SQLite
shell. I don't think the shell is intended to be a 'Do Everything' tool,
but a convenience and quick operation tool. As you command line
operation shows, it isn't that hard to do the operation with other tools.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-13 Thread Richard Damon
On 11/13/19 3:25 AM, Wout Mertens wrote:
> Fascinating discussion, and threads like this are why this is the only
> mailing list that always triggers my "important" flag :)
>
> My problem with names isn't the number of fields needed to present them (I
> liberally use JSON1), but the operations that are possible on them, and the
> UI needed to enter this data.
>
> I want to know, given a world where people have nicknames, no last names,
> gender fluidity, customizable pronouns, honorifics, super short and super
> long names, non-latin characters etc
>
>- How do I create an email greeting
>- How do I create an address label
>- How should I sort a list of names
>- How should I show a logged in user
>- How do you let the user fill in their name
>
> I'm thinking that maybe the name filling in would just take the full name
> as they want to write it, and if their name has non-latin charset
> characters, it opens another field where they can enter their "western
> name".
>
> Then there's the matter of honorific, which is a searchable dropdown (all
> languages combined I think) that lets you add custom honorifics if needed.
>
> While they are filling in the fields, there's a preview section that shows
> derived data, like email:"Dear [honorific] [lastname]",
> profile:"[fullname]" etc, where the user can pick if they'd rather be
> addressed with "Hi [firstname]", "To: [alias]" etc. Depending on the
> application, it could even show the name in various declensions (e.g. in
> slavic languages), or custom pronouns.
>
> It's complex, but it could be a nice UI component to open source, and it
> would result in an object with the various derivations of the name that
> you'd store schemaless.
>
> However, it could be that people will feel threatened or reduced to a
> number with that sort of interface. I have no idea how to fix that.
>
> Wout.
>
>
> Wout.

The only answer that I can give to most of those is that if you want to
do it 'right', you need to ask the user for that information with an
explanation of why you want it. You can perhaps start with the some of
the blocks filled in from previous answers and let them edit it.

The one thing you might not need is how to sort, because unless you
directory listing for people to scan and find a person (and if you are,
you probably should be asking explicit permission to do so), you
generally don't really need to sort names, just be able to search names,
and there wild card search are often much better at finding who you are
looking for.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-12 Thread Richard Damon
On 11/12/19 2:42 PM, Michael Tiernan wrote:
> On 11/10/19 1:21 AM, Gary R. Schmidt wrote:
>> So what happens when someone from a family who only uses first- and
>> last-names moves to Kansas?
>>
>> Do they have to make up a middle-name so that he idiots can fill out
>> the forms? 
>
> I am most definitely not going to take one side or the other. My only
> suggestion is for anyone to see the depth and complexity of the
> problem, get involved in genealogy. You'll want to scream very
> quickly. :) 

Yep, I AM involved in genealogy, and there you not only want to just
record the person's name, but you have a real reason to want to group
people by 'last name' as that is one hint that they might be related,
but there are all sorts of rules in different cultures about family
names (can't really call them 'last names' as they aren't always last,
and 'Surnames' aren't accurate either)

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 2:57 PM, Jose Isaias Cabrera wrote:
> Igor Tandetnik, on Monday, November 11, 2019 02:24 PM, wrote...
>> On 11/11/2019 12:50 PM, Richard Damon wrote:
>>> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
>>> screen.
>> Or more, depending on what you mean by  "glyph". See e.g. U+FDFB (ARABIC
>> LIGATURE JALLAJALALOUHOU,
>> https://www.fileformat.info/info/unicode/char/fdfb/index.htm ) or U+FB03
>> (LATIN SMALL LIGATURE FFI,
>> https://www.fileformat.info/info/unicode/char/fb03/index.htm)
> Thanks for this, Igor.  Again, UTF32 has lots of space, still.  If you look 
> at the representation of these two characters,
>
> ARABIC LETTER JALLAJALALOUHOU UTF-32 (hex) 0xFDFB (fdfb)
> LATIN SMALL LIGATURE FFI UTF-32 (hex) 0xFB03 (fb03)
>
> Look at their hex representations in UTF32:
> 1. 0xFDFB
> 2. 0xFB03
>
> The first 4 0's are still unused spaces.  Japanese, Chinese, etc., glyphs 
> have an unique UTF32 code, so, it will always work.
>
> josé

Unicode has decreed that the highest code-point that can be called a
code-point is 0x10 because to go higher breaks UTF-16, so there
isn't as much room as you might think.

This give us 1,114,112 possible code points.

There are currently 137,994 code points assigned to characters, 66
assigned as non-characters, 2048 reserved for the surrogates, and a
number reserved for private use, leaving 836,536 currently unassigned.
This says we have some space to grow, but there are still a lot of
archaic and unusual scripts that are being proposed or worked on.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 3:49 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 02:37 PM, wrote...
>
>> No.
> Aaaah, my apologies.  We are talking about different things. You are talking 
> about a combination of Unicodes vs. full, character. I take it back.  Yes, if 
> you are combining these, then, of course, you are going to have to a 
> different word count because there are actually characters being involved.  
> are talking pieces vs. full words.  If there is a combination, is just like 
> the accented e, é, why not use the one character vs the combination?
>
> josé

Because not all accented characters have a single code-point. In my
example there is, because Greek was worked on earlier. At some point in
the work on Unicode, they realized that there really were too many
combinations that happen in real life to try to assign code-points to
all of them. This also happened in the CJK characters there are a very
large number of them, far more than they want to give code-points to, so
a large number of archaic forms, that are currently mostly only used in
names, are built with composing characters. (Back to problems with names).

The article at http://unicode.org/faq/char_combmark.html gives some
examples, one is:

The Devanagari syllable "ni" must be composed using a base character
"na" (न) followed by a combining vowel for the "i" sound ( ि), although
end users see and think of the combination of the two "नि" as a single
unit of text.

So the question comes, when do you REALLY need to know how many
code-points are in a string, or get a specific number of them? Having a
given number of code-units (or bytes) can be useful for building indexes
where a fixed size makes addressing easier for searching. Counting by
Glyphs is sometimes useful at presentation layer (but needs to be
combined with character widths).

An Input Method would need to deal with the characters as code-points
(likely decomposed), but also probably needs to know about the Glyph to
show the cursor (unless that can be handled by the output method that it
uses).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 2:16 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 12:50 PM, wrote...
>
>> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
>> screen.
> This is not true, if the string has more or at least 20 UTF32 characters, and 
> you request 20 character while still talking UTF32, it will print 20.  Once 
> you move to UTF16 or UTF8, then, yes, you are correct.
You will get twenty code points but not twenty glyphs. UTF-32 has the
property that one code-unit is one code-point (which UTF-8 and UTF-16
don't have), but not one code-point = 1 glyph.
>> One quick way to see this is that there is a need for NFD and NFC
>> representations, because some characters can be decomposed from a
>> combined character into a base character + a combining character, so a
>> string in NFD form may naturally 'compress' itself when being printed.
> This is the reason why you want to use UTF32.  UTF8, and UTF16 has to use 
> combination of their character set to cover Eastern languages.  While all 
> languages fit perfectly in UTF32 and they all have their own unique home.
>
> josé

No.

A simple example: Ἀβιά vs Ἀβιά

Both are 4 glyphs or what we would call characters, the first is 6
code-points (U+391, U+313, U+3B2, U+3B9, U+3B1, U+301), the second is 4
code-points (U+1F08, U+3B2, U+3B9, U+3AC)

In this case the decomposed characters happen to match a composed
characters, but that is not always true, some less common composed glyph
do not have a unique single code point assigned to them).

This shows that 1 code point does not equal 1 character, for the usual
user definition of a character.

There are a NUMBER of points in Unicode where to express a single glyph
to the use, it takes multiple code-points to express it. Very shortly
after they realized they needed to extend Unicode beyond the initial 16
bit character set they first thought it could be, the also realized that
they could never reach the goal of assigning a unique code point to the
basic glyphs of every language, so settled on letting some (many) glyphs
be expressed as a combination of glyphs, with somewhat simple (but not
trivial) rules on how to do this.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 12:39 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 11:19 AM, wrote...
>
>> UTF-32 is a reasonable internal operation format, if code-point
>> operations are important. It does not make a good transmission format,
> I agree.  That is why, I have not created any files for anything as UTF32 for 
> delivery or anything personal. ;-)  It's bulky.  However, space is cheap now, 
> and **I think** UTF32 will be a good uniform character set to use.  But, yes, 
> we are far away from that idea.
>
>> as it is usually takes more media than UTF-8 or UTF-16, and for
>> transmission, the message size is important. The big issue is that
>> code-point counting is rarely what you want, you generally want Glyph
>> counting, which even UTF-32 doesn't provide.
> Yes, agreed.  But, I was just pointing out that a name could be displayed as 
> a symbol.
>
>
>> But this shows that 'Unicode' doesn't handle the name, as is, which was
>> the point of the rule, if you design you software just assuming that
>> Unicode can handle all names, you will be very occasionally be wrong.
> I was talking about the artist previously known as Prince.  I was trying to 
> say that it would be feasible to insert that image/symbol as UTF32. It was 
> more of a joke than pretending to have found the answer. :-)
>
> josé
>
But you can't as that symbol doesn't have a Unicode Code point. To do
this you need to go BEYOND Unicode to define private use characters with
Glyphs.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 12:30 PM, Jose Isaias Cabrera wrote:
> Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...
>> On 11/11/2019 10:49 AM, Jose Isaias Cabrera wrote:
>>> So, yes, it's bulky, but, if you want to count characters in languages such 
>>> as
>>> Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert that 
>>> string
>>> to UTF32, and do a string count of that UTF32 variable.
>> Between ligatures and combining diacritics, the number of Unicode codepoints 
>> in a
>> string has little practical meaning. E.g. it is not necessarily correlated 
>> with the
>> width of the string as displayed on the screen or on paper; or with the 
>> number of
>> graphemes a human would say the string contains, if asked.
> That could be true, but have you tried to just display an specific number of 
> characters from an UTF8 string having Hebrew, Arabic, Chinese, Japanese (see 
> below).
>
>>> Most people have to figure out what Unicode they are using, count the 
>>> bytes, divide
>>> by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
>>> convert it to
>>> UTF32, and do a count.
>> And then what do you do with that count? What do you use it for?
> Say that I am writing a report and I only want to print the first 20 
> characters of a string, that would be something like,
> if (var.length> 20)
> {
>   writefln(var[0 .. 20]);
> }
> else
> {
>   writefln(var ~ "   "[0 .. 20]);
> }
> if var is declared UTF8, and there is a Chinese string or some multi-byte 
> language in that string, this will never print 20 Chinese characters. It will 
> print less.  If, I convert that UTF8 string to UTF32, then each multi-byte 
> character fits in one UTF32 character.  So,
>
> dchar[] var32 = std.utf.toUTF32(var);
> if (var32.length> 20)
> {
>   writefln(var32[0 .. 20]);
> }
> else
> {
>   writefln(var32 ~ cast(dchar[])"   "[0 .. 20]);
> }
>
> This will always print 20 characters, whether these are ASCII or multi-byte 
> language characters.  Thanks.
>
> josé
> ___

Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
screen.

One quick way to see this is that there is a need for NFD and NFC
representations, because some characters can be decomposed from a
combined character into a base character + a combining character, so a
string in NFD form may naturally 'compress' itself when being printed.
Then you need to remember that the one of reasons for providing the
combining characters was that it was decided that there would not be
created code points for all the possible composed characters, that many
would be expressed only as decomposed form of a base character +
combining character(s). Thus code-point count is not the same a output
glyph count. In fact, Unicode works hard at avoiding the term
'character' as it isn't well defined, often being thought as a glyph,
but also sometimes as a code-point.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 12:13 PM, Simon Slavin wrote:
> On 11 Nov 2019, at 4:02pm, Igor Tandetnik  wrote:
>
>> And then what do you do with that count? What do you use it for?
> This is a key point.  When I started programming I used to do LEFT(A$(I), 14) 
> frequently.  But almost all of them were because I wanted to print the string 
> and had allocated 14 characters of space to in.
>
> Then came variable-width fonts.  The practise should have died out.  But 
> people are still doing it.
>
> There are other reasons to get the beginning of a string.  The first 
> character alone, especially.  There may be other reasons to get its length.  
> But it was mostly done because the length of the string was its width on the 
> display.  And it isn't any more.

And you do understand that getting the first 'character' (if you mean
the first printed glyph) of a string in UTF-32 is not trivial, because
it could easily be more than one code-point due to combining characters.
For many purposes it might not even include the first code-point, as
that might be a formatting meta-point like the BOM or a text-direction
code which should be skipped.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 10:49 AM, Jose Isaias Cabrera wrote:
>
> Richard Damon, on Monday, November 11, 2019 09:47 AM, wrote...
>> On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
>>> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>>>> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>>>>
>>>>> Not if the system uses UTF32. :-) You could put the pictograph in that 
>>>>> that textbox, and it'll work.
>>>> Can you point to some description of this and how it works ?  I've never 
>>>> heard of it.
>>> My point was that one could define the UTF32 [1] code for that specific 
>>> pictograph or glyph, and it'll work.
>>>
>>> josé
>>>
>>> [1] https://en.wikipedia.org/wiki/UTF-32
>> UTF-32 gives no encoding advantage over other Unicode formats, as all
>> allow expressing all the Unicode code points.
> I disagree.  I believe that the future is UTF32.  I will give you that it's 
> bulky, for example, here is the letter a written to a file in Windows-1252, 
> UTF8 signed, UTF16be signed, a UTF32be signed:
>
> bytes filename
> 1 0_Windows-1252.txt
> 4 1_UTF8signed.txt
> 4 2_UTF16BEsigned.txt
> 8 3_UTF32signed.txt
>
> So, yes, it's bulky, but, if you want to count characters in languages such 
> as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert 
> that string to UTF32, and do a string count of that UTF32 variable.  Most 
> people have to figure out what Unicode they are using, count the bytes, 
> divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
> convert it to UTF32, and do a count.
UTF-32 is a reasonable internal operation format, if code-point
operations are important. It does not make a good transmission format,
as it is usually takes more media than UTF-8 or UTF-16, and for
transmission, the message size is important. The big issue is that
code-point counting is rarely what you want, you generally want Glyph
counting, which even UTF-32 doesn't provide.
>
>> There is no code-point assigned to the Pictogram for his name (As far as
>> I know), so their is no value you can put in represent it.
> You're right, but not that many people are changing their name to an image.  
> However, if two or three or more folks want to, there are enough empty UTF32 
> characters, that it can be accomplished.
But this shows that 'Unicode' doesn't handle the name, as is, which was
the point of the rule, if you design you software just assuming that
Unicode can handle all names, you will be very occasionally be wrong.
There are actually many more cases of this, I imagine a lot of
aboriginal people who have their own writing systems that haven't been
adopted by Unicode, have names (as their preferred name) that can't be
expressed in official Unicode. They may have a Government assigned
'official' name (if they have had to interact with the Government) that
can be represented, but that really isn't their name (Prince just had
the resources and gall to do it 'officially').
>
>
>> It would be possible to include in the application some way to add user
>> defined glyphs to the system fonts for user defined code points, and
>> then reconcile these when transferring data from one system to another.
> We have done this for special customer requirements and have assigned our own 
> UTF32 characters an specific design with our software.  But, yes, it's only 
> our software, but what if... a reconciliation can happen?
>
> josé
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-11 Thread Richard Damon
On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>>
>>> Not if the system uses UTF32. :-) You could put the pictograph in that that 
>>> textbox, and it'll work.
>> Can you point to some description of this and how it works ?  I've never 
>> heard of it.
> My point was that one could define the UTF32 [1] code for that specific 
> pictograph or glyph, and it'll work.
>
> josé
>
> [1] https://en.wikipedia.org/wiki/UTF-32

UTF-32 gives no encoding advantage over other Unicode formats, as all
allow expressing all the Unicode code points.

There is no code-point assigned to the Pictogram for his name (As far as
I know), so their is no value you can put in represent it.

There are a number of code points reserved for user definition, but many
of those have been informally reserved for characters no yet put into
Unicode.

It would be possible to include in the application some way to add user
defined glyphs to the system fonts for user defined code points, and
then reconcile these when transferring data from one system to another.

Another option would be to define some user defined code point pair as a
graphics escape, and put within it an encoding of a graphics file
containing the glyph, but at that point you are really outside of being
'Unicode'

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-10 Thread Richard Damon
On 11/10/19 1:21 AM, Gary R. Schmidt wrote:
> On 10/11/2019 13:44, Doug wrote:
>> Au Contraire, Jens! In many local contexts you can normalize people's
>> names. I was born in Kansas, USA. My parents filled out a birth
>> certificate for me. It had a place on the form for first name, middle
>> name, last name, and a suffix like II or III.
>>
>> That birth certificate form determined that everyone born in Kansas
>> (at that time), had a first, middle, and last name. There was no
>> discussion of the matter. That's the way it was. The form led the
>> way; people never thought about whether it was effective or not. Each
>> newly-born child was given a first, middle, and last name.
>>
>> Effective was irrelevant for that system. There was no option, no
>> alternative. It simply was.
>>
>> All systems are like that at each moment in time. They are what they
>> are at any moment in time, and they force the users to behave the way
>> the system wants them to behave. If you want to change the system and
>> momentum is on your side, then immediately you have a new system - at
>> that moment in time. It is composed of the old system and the momentum.
>>
>> Back to names: just like the birth certificate, a system which
>> assigns a name to you, actually coerces you to have that name,
>> because within that system, you exist as that name. The "names"
>> article is totally wrong when it says that each assumption is wrong.
>> Each of those assumptions is correct, and I can find at least one
>> system which makes each one correct. Within each system, the
>> assumption works, and is valid.
>>
>> My two cents...
> Is not worth the paper it is written on!
>
> So what happens when someone from a family who only uses first- and
> last-names moves to Kansas?
>
> Do they have to make up a middle-name so that he idiots can fill out
> the forms?
>
> Well, in the case of the US Navy back in the late 1980's, when a
> friend of mine from here in Australia, who only has a first and
> last-name married a USN pilot and moved to the USA, she was told that,
> "Yes, you have a middle name."  No amount of arguing, or producing of
> official documents, (well, it's the USA, most people there don't know
> what a passport is), could prevail.  In the end she conceded defeat
> and became  Doe , for the duration.
>
> Names are impossible, unless you use a free-form, infinite-length
> field, you won't be safe, and even then, someone with turn up whose
> name is 'n' recurring to an infinite number of characters or something!
>
> Cheers,
>     Gary    B-) 
Actually, 'The Artist whose name formerly was Prince' (which wasn't his
name, his legal name was an unpronounceable pictograph), breaks every
computer system I know.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Things you shouldn't assume when you store names

2019-11-09 Thread Richard Damon
But the main point of the document is that just because you know how
things 'must' be where you are, doesn't mean that every name you need to
handle will be built on those same rules. I can think of cases which
shows problems with most of those rules. The key is that each of the
rules sounds like a rule that someone has assumed, and each of the rules
is something that the author of the article know of (or at least can
think of) a case where that rule doesn't hold. A rule that holds only
99.999% of the time is not always true.

On 11/9/19 9:44 PM, Doug wrote:
> Au Contraire, Jens! In many local contexts you can normalize people's names. 
> I was born in Kansas, USA. My parents filled out a birth certificate for me. 
> It had a place on the form for first name, middle name, last name, and a 
> suffix like II or III.
>
> That birth certificate form determined that everyone born in Kansas (at that 
> time), had a first, middle, and last name. There was no discussion of the 
> matter. That's the way it was. The form led the way; people never thought 
> about whether it was effective or not. Each newly-born child was given a 
> first, middle, and last name.
>
> Effective was irrelevant for that system. There was no option, no 
> alternative. It simply was.
>
> All systems are like that at each moment in time. They are what they are at 
> any moment in time, and they force the users to behave the way the system 
> wants them to behave. If you want to change the system and momentum is on 
> your side, then immediately you have a new system - at that moment in time. 
> It is composed of the old system and the momentum.
>
> Back to names: just like the birth certificate, a system which assigns a name 
> to you, actually coerces you to have that name, because within that system, 
> you exist as that name. The "names" article is totally wrong when it says 
> that each assumption is wrong. Each of those assumptions is correct, and I 
> can find at least one system which makes each one correct. Within each 
> system, the assumption works, and is valid.
>
> My two cents...
> Doug 
>
>> -Original Message-
>> From: sqlite-users 
>> On Behalf Of Jens Alfke
>> Sent: Saturday, November 09, 2019 5:11 PM
>> To: SQLite mailing list 
>> Subject: Re: [sqlite] Things you shouldn't assume when you store
>> names
>>
>> On Nov 9, 2019, at 1:09 PM, sky5w...@gmail.com wrote:
>>> In this case, data modelers hoping to save a column. arrggg.
>>> It flies in the face of data normalization and pushes the
>> problem down the
>>> line.
>> But you _cannot_ normalize people’s names; that’s the exact point
>> of that article. Anything you assume about the structure of a name
>> will be wrong in some culture.
>>
>> -Jens
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Richard Damon
On 11/7/19 5:13 PM, Doug Currie wrote:
> On Thu, Nov 7, 2019 at 4:23 PM Richard Damon 
> wrote:
>
>> One thought would be to generate a ‘hash’ from part of the record, maybe
>> the record ID, and select records based on that value. The simplest would
>> be something like id%100 == 0 would get you 1% of the records. That
>> admittedly isn’t that random.
>>
>> Put the ID through a linear congruential generator, something like
>>
>> mod(a * Id + b, c) % 100 == 0
>>
>> And you will pretty well scramble the selection
>>
>>
> Yes, and if a, b, and c come from a randomization table, they can be
> modified to obtain a different pseudo-random set.
>
> e
a, b, and c should be chosen to give a reasonable random number
generator (there are tables of good values), not be arbitrary values.
The 100 and the 0 can be changed (or use some other test on the random
number) to get different selections.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Richard Damon
> On Nov 7, 2019, at 2:15 PM, Merijn Verstraaten  wrote:
> 
> 
>> On 7 Nov 2019, at 19:16, David Raymond  wrote:
>> 
>> Along those lines SQLite includes the reverse_unordered_selects pragma
>> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
>> which will flip the order it sends rows in queries that don't explicitly 
>> specify an ordering. It's there to assist you in finding spots in your code 
>> where you might be relying on implicit ordering when you really shouldn't be.
> 
> Like the rest of this threads, this is just pointing out why the things in my 
> initial email don't work, but I already knew that. Which is why I asked for 
> help to see if there is a way to do what I want that *does* work. I don't 
> care particularly about the details of "can I control the order the condition 
> is evaluated", it's just that all reasonable ways to sample large streams 
> that I know would require a deterministic order.
> 
> If someone has a different/better idea on how to return just a random sample 
> from a query in a repeatable way, I'm all ears.
> 
> So far the only suggestion was "use some non-deterministic random sampling 
> method and store the result", but since my samples are large and I have lots 
> of them, this would balloon my storage by >100x and I don't have the 
> available storage to make that work.
> 
> - Merijn
> 

One thought would be to generate a ‘hash’ from part of the record, maybe the 
record ID, and select records based on that value. The simplest would be 
something like id%100 == 0 would get you 1% of the records. That admittedly 
isn’t that random.

Put the ID through a linear congruential generator, something like

mod(a * Id + b, c) % 100 == 0

And you will pretty well scramble the selection
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Roadmap?

2019-10-27 Thread Richard Damon
On 10/26/19 8:07 PM, Thomas Kurz wrote:
>> Feel free to make suggestions. Which missing feature or features causes 
> you the most bother?
>
> Thanks, Dan.
>
> To me, the most puzzling thing is the lack of full ALTER TABLE support (DROP 
> COLUMN, MODIFY COLUMN, ADD CONSTRAINT, DROP CONSTRAINT). Modifying tables is 
> some kind of science in SQLite, and thus, very error-prone. I'd be willing to 
> donate for that but as a private user I cannot affort 85 k$ ;-)
>
My one thought about fuller ALTER TABLE support would be that since
SQLite stores the schema as the simple CREATE TABLE command, that ALTER
TABLE needs to know how to edit that string to match the changes, where
simple adds are much easier than other forms of editing.

Also, due to the way SQLite handles rows, adding columns will
automatically assume default values for all existing rows. Anything much
more complicated  could easily require that SQLite update the whole
table row by row, and that likely would require making sure that nothing
else was looking at the database, as it might not be possible to keep
the view consistent.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Roadmap?

2019-10-27 Thread Richard Damon
On 10/27/19 10:24 AM, Simon Slavin wrote:
> On 27 Oct 2019, at 9:12am, Thomas Kurz  wrote:
>
>> the discussion standard deviation has shown that at least STDEV and POWER 
>> would be very helpful if they part of SQLite core.
> These are presentation issues.  Not database issues.   The results of such 
> calculations are unlikely to be used to decide on the continuation of a SQL 
> statement.

Doing a group by summary query to get the mean and stdev of a sample in
the database, and then using those to construct a where clause to
get/exclude data points that are outliers would seem to be a very
reasonable sort of query that is really a database issue. This could
normally be done as two distinct queries (maybe within a read
transaction so we get consistent data) and the stdev and mean built by
building a count of records, and the sum of the value and value**2, and
computing the mean and stdev from that in code, as was shown earlier in
the thread, but having STDEV as a built-in summary function could be useful.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Roadmap?

2019-10-21 Thread Richard Damon
On 10/20/19 11:03 PM, Rowan Worth wrote:
> On Sun, 20 Oct 2019 at 17:04, Simon Slavin  wrote:
>
>> Another common request is full support for Unicode (searching, sorting,
>> length()).  But even just the tables required to identify character
>> boundaries are huge.
>>
> Nitpick: there are no tables required to identify character boundaries. For
> utf-8 you know if there's another byte to come which is part of the current
> codepoint based on whether the current byte's high bit is set, and
> furthermore you know how many bytes to expect based on the initial byte.
>
> I'm less familiar with utf-16 which SQLite has some support for, but a
> quick read suggests there are exactly two reserved bit patterns you need to
> care about to identify surrogate pairs and thus codepoint boundaries.
>
> Tables relating to collation order, character case, and similar codepoint
> data can of course get huge, so your point stands.
> -Rowan

My memory is that Unicode is somewhat careful NOT to define what is a
'character' because that can really get complicated, and often
application specific about what it wants.

You have code-units, which for utf-8 are basically bytes.

You have code-points, which is what most people think of as a
'character' which has a single Unicode Codepoint number.

Then you have Graphemes, which are clusters of code-points that tend to
be expressed in a single glyph in output. (and some code-points don't
generate any output).

Dealing with Graphemes gets complicated, and that is where you run into
the need for lots of tables. Code-points them selves are fairly simple
to deal with, the problem is that in some langauges just dealing with
code-points doesn't let you fully handle some of the 'simple' operations
like sorting, or case folding with 100% accuracy, that sometimes
requires dealing with code-point clusters.

But, you also run into the issue (as I understand it) that Unicode
doesn't really define a universal ordering for all characters, that this
can be a language specific problem, and Unicode can't really solve that
issue. (Two langauges might use some of the same characters, but treat
them differently for sorting).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Standard deviation last x entries

2019-10-20 Thread Richard Damon
rom vw_StdDev_Max4_Aggregates Where n>1
>>> Union All
>>> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1;
>>>
>>> Create View vw_StdDev_Populations As
>>> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", (
>>>With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All
>>>  Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>>) Select s From r Order By i Desc Limit 1
>>> ) sigma From vw_StdDev_Max4_Aggregates Where n>1
>>> Union All
>>> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1;
>>>
>>> 
>>> Ok, here the result I get on my test-table, when the 3 views
>>> are in place, and the following gets executed:
>>>
>>> Select ID, n_PopulationSize, sigma From vw_StdDev_Populations;
>>> ID  n   sigma
>>> 
>>> 3   4   14.7901994577491
>>> 5   3   8.16496580927733
>>> 6   2   2.5
>>> 9   1   0
>>>
>>>
>>> Select ID, n_SampleSize, s From vw_StdDev_Samples;
>>> ID  n   s
>>> 
>>> 3   4   17.0782512765993
>>> 5   3   10.1
>>> 6   2   3.53553390593274
>>> 9   1   null
>>>
>>> Please note the returned null (vs. the real 0) in the last result
>>> (done, because "unbiased estimates" are undefined for sample-size == 1)
>>>
>>> Finally for those interested, the CTE-based "iterated Square-Root",
>>> cut out for an isolated calculation of sqrt(2).
>>>
>>> With r(s2, s, i) As (Select 2, 1, 1 Union All
>>>Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>> ) Select s From r Order By i Desc Limit 1
>>>
>>> The max. iterations are limited to 32 - though for input-values
>>> in a "typical range", the iteration will usually exit earlier...
>>> In case of the example above -> sqrt(2), the iteration-counter
>>> i reached only 6 (as can be seen, when you include i alongside s
>>> in the final Select.
>>>
>>> With r(s2, s, i) As (Select 2, 1, 1 Union All
>>>Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>> ) Select i, s From r Order By i Desc Limit 1
>>>
>>> HTH
>>>
>>> Olaf
>>>
>>>
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>
>

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Standard deviation last x entries

2019-10-12 Thread Richard Damon
One thing to point out, it sounds like you are dealing with a sample,
and I think you want to get the estimated standard deviation of the
process, which says you really want to use the adjusted formula that
uses N-1 in the denominator, as the expected value of the standard
deviation of a sample is smaller than the standard deviation of the
process/population.

One quick test is to think what you would do if you had just a single
point.  Does this mean you really have a Standard Deviation of Zero (as
that is the SD of a population of one) or does it mean you don't have an
idea of what the SD of the population is.

On 10/12/19 12:13 PM, Bart Smissaert wrote:
> Thanks, I do know how to calculate the SD in code, but I thought in this
> particular case it might be faster to do this in SQL.
> Only problem is the square root and for that reason I will test this in
> code as well and see how it compares with SQL.
>
> I found a way to get the one from last step, so that is without doing the
> final square root step.
> Note that this has to do with a number of systolic blood pressure readings
> and I am actually taking the last 6 and  that is useful as due
> to treatment the SD often will be less for the later values:
>
> UPDATE QR3PARAMS SET AVG_BP =
> (SELECT AVG(SYST) FROM (SELECT SYSTOLIC AS SYST FROM BP WHERE ID =
> QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6))
>
> UPDATE QR3PARAMS SET DEV_BP =
> (SELECT AVG(SYST * SYST) FROM (SELECT SYSTOLIC - QR3PARAMS.AVG_BP AS SYST
> FROM BP WHERE ID = QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6))
>
> RBS
>
>
>
>
> On Sat, Oct 12, 2019 at 4:21 PM Keith Medcalf  wrote:
>
>> The Standard Deviation of the Population is the Square Root of the mean of
>> the second order differences.
>>
>> For your input values, you calculate the mean.
>> Then you calculate the mean of the square of the difference between each
>> value and the mean of the values.
>> Then you take the square root of that.
>>
>> You can also compute it using a running calculation (generally more
>> accurate on IEEE754 hardware).
>>
>> So just retrieve the values you are interested in and do the calculations
>> in your application.  That is, if you only want FOUR values, then write a
>> query that returns only those FOUR values, and use those to computer the
>> answer.  While you can do all the calculations (including the Square Root)
>> in SQL, it is likely to be horrendously slow.
>>
>> However, before you can do *ANYTHING* else, you need to be able to
>> demonstrate that you can write a select statement that retrieves the values
>> you want to work on, only the values you want to work on, and nothing but
>> the values you want to work on.  Having 47,000 values and sauing that you
>> would like to find the standard deviation of some set of four of them is
>> entertaining, but not useful in any way.
>>
>> --
>> The fact that there's a Highway to Hell but only a Stairway to Heaven says
>> a lot about anticipated traffic volume.
>>
>>> -Original Message-
>>> From: sqlite-users  On
>>> Behalf Of Bart Smissaert
>>> Sent: Saturday, 12 October, 2019 08:09
>>> To: General Discussion of SQLite Database >> us...@mailinglists.sqlite.org>
>>> Subject: [sqlite] Standard deviation last x entries
>>>
>>> How do I get the standard deviation of the last 4 entries (there could be
>>> less than 4) of an integer column grouped by an integer ID entry in
>>> another
>>> column in the same table.
>>>
>>> So data could be like this:
>>>
>>> ID xValue
>>> 
>>> 1   130
>>> 1   120
>>> 1   140
>>> 1   100
>>> 1   110
>>> 2   140
>>> 2   130
>>> 2   150
>>>
>>> I know that SQLite doesn't have the SQRT function (and I can't make a UDF
>>> as I am doing this in an Android phone app) but I can do this last step
>>> in
>>> code.
>>>
>>> RBS
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Standard deviation last x entries

2019-10-12 Thread Richard Damon
On 10/12/19 11:23 AM, Richard Damon wrote:
> On 10/12/19 10:08 AM, Bart Smissaert wrote:
>> How do I get the standard deviation of the last 4 entries (there could be
>> less than 4) of an integer column grouped by an integer ID entry in another
>> column in the same table.
>>
>> So data could be like this:
>>
>> ID xValue
>> 
>> 1   130
>> 1   120
>> 1   140
>> 1   100
>> 1   110
>> 2   140
>> 2   130
>> 2   150
>>
>> I know that SQLite doesn't have the SQRT function (and I can't make a UDF
>> as I am doing this in an Android phone app) but I can do this last step in
>> code.
>>
>> RBS
> I may not be the best at forming the SQL to do this, but here is the
> general idea I would use. First use a query with ORDER BY DECR and LIMIT
> to get the last 4 items, and select the value, and a computer column of
> value2 = value * value.
>
> Then over this result, do a GROUP BY to compute N  = count(value), S =
> sum(value), S2 =sum(value2)
>
> You can then compute the variance (standard deviation squared) from the
> equation
>
> Variance = S2 / N - (S / N) * (S / N)
>
> This equation assumes that you are working with the full population and
> not just a sample, if you data is to be considered a sample you need to
> make a small adjustment in the formula to
>
> Estimated Variance of Popultion = S2 / (N-1) - S * S / (N * (N-1))
>
>
(Sorry about the messed up subject, reposting to fix it)

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] /

2019-10-12 Thread Richard Damon
On 10/12/19 10:08 AM, Bart Smissaert wrote:
> How do I get the standard deviation of the last 4 entries (there could be
> less than 4) of an integer column grouped by an integer ID entry in another
> column in the same table.
>
> So data could be like this:
>
> ID xValue
> 
> 1   130
> 1   120
> 1   140
> 1   100
> 1   110
> 2   140
> 2   130
> 2   150
>
> I know that SQLite doesn't have the SQRT function (and I can't make a UDF
> as I am doing this in an Android phone app) but I can do this last step in
> code.
>
> RBS

I may not be the best at forming the SQL to do this, but here is the
general idea I would use. First use a query with ORDER BY DECR and LIMIT
to get the last 4 items, and select the value, and a computer column of
value2 = value * value.

Then over this result, do a GROUP BY to compute N  = count(value), S =
sum(value), S2 =sum(value2)

You can then compute the variance (standard deviation squared) from the
equation

Variance = S2 / N - (S / N) * (S / N)

This equation assumes that you are working with the full population and
not just a sample, if you data is to be considered a sample you need to
make a small adjustment in the formula to

Estimated Variance of Popultion = S2 / (N-1) - S * S / (N * (N-1))


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hello everyone, I found some strange behavior when using the Command Line Shell For SQLite.

2019-10-03 Thread Richard Damon
On 10/3/19 5:46 AM, 算 子 wrote:
> In column mode, each record is shown on a separate line with the data aligned 
> in columns. For example:
>
> sqlite> .mode column
> sqlite> select * from tbl1;
> one   two   
> --  --
> hello 10
> goodbye   20
> sqlite>
>
>
> But I found,only when columns text is ANSI,it can normal work.
>
> sqlite> .mode column
> sqlite> select * from 语言声明表;
> 语言指针        引用数量        语言名称        语言注释
> --  --  --  --
> 1           0           1           0
> 2           0           2           0
>
> If the columns text is utf-8,it can't work.
> What should I do for it?
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

My guess is that column mode doesn't understand all the strangeness that
comes with Unicode and fonts, and probably expects that each code point
is exactly the same width. I note that at least on my computer the CJK
characters are wider than the ASCII characters, and thus makes columns
not align.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Conflict between snapshots and checkpoints

2019-09-30 Thread Richard Damon
On 9/30/19 4:28 AM, Gwendal Roué wrote:
> According to
> http://dbmsmusings.blogspot.com/2019/06/correctness-anomalies-under.html,
> SNAPSHOT ISOLATION is stronger than REPEATABLE READ, in that it prevents
> "phantom reads" (
> http://dbmsmusings.blogspot.com/2019/05/introduction-to-transaction-isolation.html).
> I think SQLite prevents phantom reads, and so we actually get actual
> SNAPSHOT ISOLATION (as written in https://www.sqlite.org/isolation.html).

Based on a bit of research with things like
https://en.wikipedia.org/wiki/Snapshot_isolation

Snapshot Isolation appears to be a Term of Art, not a Standardized Term
so it can be forgiven if it isn't used in the documentation. Also, it
seems to imply non-serialized writes, which SQLite does NOT provide, so
isn't even really applicable.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Conflict between snapshots and checkpoints

2019-09-29 Thread Richard Damon
On 9/29/19 11:40 AM, Gwendal Roué wrote:
> Thank you very much Keith.
>
> Apologies for my imprecise vocabulary, and the use of the same "snapshot"
> word with different meanings.
>
> I have used the term "snapshot isolation" as used in
> https://www.sqlite.org/isolation.html; But I'll remember about
> REPEATABLE-READ isolation.
>
> I also thank you very much for confirming that preventing checkpoints
> allows the WAL markers to remain valid.
>
> My goal is to provide a set of concurrency primitives that are useful for
> GUI app developers. I'm learning a lot about SQLite in the process, of
> course.

And since you reused the connection (otherwise you just need to keep the
transaction open and the snapshot was preserved) you don't have isolation.

Quote:


No Isolation Between Operations On The Same Database Connection

SQLite provides isolation between operations in separate database
connections. However, there is no isolation between operations that
occur within the same database connection.



-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Conflict between snapshots and checkpoints

2019-09-29 Thread Richard Damon
On 9/29/19 3:27 AM, Gwendal Roué wrote:
> Thank you, Richard
>
> But now I fail to understand the indented use case of sqlite3 snapshots..
> Why allow to reuse snapshots with several calls to open()? Why do they
> exist at all, since we can already profit from snapshot isolation with one
> transaction (at the cost of keeping it open)?
>
> For information, my quest for snapshot protection has met some success (
> https://github.com/groue/GRDB.swift/pull/625)
>
> Given I control the connections to a given database file, I am able to open
> one writer connection, and, on demand, up to N read-only connections. Those
> N reader connections allow concurrent database reads. Those "reads" are
> generally wrapped in a deferred transaction which provides snapshot
> isolation. At the end of this transaction, the reader connection becomes
> available for another read.
>
> One can now create a "snapshot". Those snapshots use the same pool of N
> readers: snapshot reads are then wrapped in a deferred transaction and
> sqlite3_snapshot_open. At the end of this transaction, the reader
> connection becomes available for another regular read or for another
> snapshot read.
>
> In order to protect unused snapshots, checkpoints are prevented as long as
> there exists snapshots, with sqlite3_wal_hook().
>
> I **really** hope this protects snapshots for good. Of course, a
> confirmation from knowledgeable people would be appreciated :-)
>
> Gwendal
I won;t say I know why they were put in, but here is one possibility
that I can think of. First, your model assumes that you have total
control and confidence over the entire application (and if you do, then
you can protect the snapshots yourself). By passing the snapshot and not
the connection with transaction with the snapshot, you can be sure that
the receiver can't disturb the session and invalidate the snapshot. By
opening the snapshot, the receiver is able to recreate the database at
that point, to read its state, but can't accidentally invalidate it.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Conflict between snapshots and checkpoints

2019-09-28 Thread Richard Damon
On 9/26/19 12:13 PM, Gwendal Roué wrote:
> Hello,
>
> The documentation for sqlite3_snapshot_open() [1] says:
>
>> A call to sqlite3_snapshot_open() will fail to open if the specified
> snapshot has been overwritten by a checkpoint.
>
> And indeed I am able to create a snapshot with sqlite3_snapshot_get() [2],
> then run checkpoints with sqlite3_wal_checkpoint_v2() [3], and finally have
> sqlite3_snapshot_open() fail.
>
> I have a concern, because this turns automatic checkpointing into a mortal
> danger for all snapshots. As soon as a checkpoint is performed, snapshots
> are invalidated, and whoever is relying on them has lost the database state
> he's interested into.
>
> My question is: is it possible to prevent checkpoints from completing
> successfully when a snapshot is alive?
>
> I know this is possible with a "hand-made checkpoint", made of a distinct
> connection which has started a deferred transaction and has performed an
> initial read. Such hand-made checkpoint has SQLITE_CHECKPOINT_RESTART and
> SQLITE_CHECKPOINT_TRUNCATE fail, and are still able to access their
> precious database state after a SQLITE_CHECKPOINT_PASSIVE or a
> SQLITE_CHECKPOINT_FULL.
>
> The behavior of those "hand-made checkpoint" matches well the needs of
> users who want to use WAL without thinking too much about it: they do not
> disable automatic checkpointing, and are guaranteed with a stable access to
> a given database state as long as they need it.
>
> I was wondering if such a behavior is possible with snapshots returned from
> sqlite3_snapshot_get().
>
> If it is not, then I humbly suggest that this feature would be added, and
> am willing to listen to the opinion of SQLite experts on this subject.
>
> Regards,
> Gwendal Roué
>
> [1] https://www.sqlite.org/c3ref/snapshot_open.html
> [2] https://www.sqlite.org/c3ref/snapshot_get.html
> [3] https://www.sqlite.org/c3ref/wal_checkpoint_v2.html

I think you have a fundamental issue here. Snapshots are tied to a
Transaction, and as long as that transaction doesn't invalidate that
snapshot, nothing else can. (This is what you hand-mand checkpoint does
right). The issue is that if you are reusing the connection, you are
sharing the Transaction and thus not protecting your snapshot. Once
someone else using that connection breaks the transaction, then the
snapshot is no longer protected.

The issue trying to increase the protection on a snapshot is that
currently sqlite_snapshot_free() isn't defined as protecting the
snapshot, but is merely a memory management tool, so many applications
likely don't free all their snapshots before they allow for the
possibility of them being invalidated by ending the Transaction, thus
this change would break many existing programs. I suppose it could be
implemented with a pragma or the like that doesn't allow that
transaction to be committed or invalidate the snapshot until the
snapshot is freed.

What that would ultimately do is cause your program to get an error from
SQLite3 farther down the road when you did some action on the connection
that would potentially invalidate your precious snapshot, and you now
have one more source of 'unexplained' errors returns.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite - macOS

2019-09-23 Thread Richard Damon

> On Sep 23, 2019, at 1:25 PM, Simon Slavin  wrote:
> 
>> On 23 Sep 2019, at 5:53pm, Pierre Clouthier  
>> wrote:
>> 
>> Can anyone explain how to write UTF-8 in SQLite on the Mac?
>> 
>> We use this statement:
>> 
>>  sqlite3_exec("PRAGMA encoding = \"UTF-8\";") 
> 
> This is not a SQLite problem, but a problem with escaping the quotes in 
> language you're using to execute sqlite3 calls.
> 
> What programming language or script system are you using ?  Will it accept 
> any other way to escape the quote marks ?
> 
> It should be possible to find a line of code which works on all platforms 
> your compiler supports.
> 
My guess is that it isn’t a language issues, as it sounds like they are working 
on a multi-platform  program, but that they are reading in a file that isn’t 
UTF-8 and trying to insert it directly (and getting a Unicode coding error 
because the data isn’t UTF-8), or the file IS UTF-8 but they don’t think it is 
so they run a conversion on it to convert to UTF-8 from some other character 
set, and that is throwing an unrecognized character error.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite - macOS

2019-09-23 Thread Richard Damon
Not sure what you code is doing, and you aren’t showing how you are getting 
your 'é' into SQLite, but U+FFFD is the standard character for malformed data, 
so something somewhere is complaining about converting something into UTF-8. 

Also, you CAN’T have a UTF-8 value like 0xC3A9, as that is not a byte value. 
U+00E9 expressed in UTF-8 would be 0xC3, 0xA9 (Two separate bytes), if you try 
to talk about it in some bigger word you are no longer talking UTF-8, and 
introducing all sorts of machine specifies (Big vs Little Endian, word size, 
etc).

> On Sep 23, 2019, at 12:53 PM, Pierre Clouthier 
>  wrote:
> 
> Can anyone explain how to write UTF-8 in SQLite on the Mac?
> 
> We use this statement:
> 
>   sqlite3_exec("PRAGMA encoding = \"UTF-8\";") 
> This works fine on Windows, but on macOS the data is not being stored 
> correctly.
> 
> The data to be written (passed to SQLite), is formatted in UTF-8.
> 
> For example, 'é' is U+00E9, which in UTF-8 is 0xC3A9. However, in the macOS 
> version of the database, it is converted to 0xEFBFBD, which is U+FFFD, which 
> doesn't make sense.
> 
> -- 
> Progeny Genealogy Inc. 902–681–3102
> Progeny helps you tell the Story of Your Family ™
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is wrong with this SQL?

2019-09-22 Thread Richard Damon
On 9/22/19 6:10 AM, Bart Smissaert wrote:
> Have this query:
>
> SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS AGE,
>AVG(BM.numeric_value) AS avg_Hb_Male,
>AVG(BF.numeric_value) as avg_Hb_Female
> FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX = 'Male')
> INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> 'Female')
> WHERE BF.term_text = 'Haemoglobin estimation' and
>   BM.term_text = 'Haemoglobin estimation'
> GROUP BY AGE
> HAVING AGE > 18 and AGE < 98
> ORDER BY AGE ASC
>
> Which always used to run fine and return rows. There is no error, but there
> are no returned rows. I have checked all the where conditions and they are
> all fine.
> When I leave out either of the NUM_VALUES joins (so either male or female
> patients) then it
> runs fine and returns data as expected.
>
> Something like this runs fine as well and returns the right data:
>
> select p.sex, round(avg(n.numeric_value), 2) as avg_Hb
> from patients p inner join num_values n on(p.id = n.id)
> where n.term_text = 'Haemoglobin estimation'
> group by p.sex
>
> Any idea what could be the problem here?
> Running version 3.22.0.
>
> RBS

If I am reading that right, the only way that query should have worked
at all is if INNER JOIN wasn't being supported and had been converted to
a LEFT JOIN.

INNER JOIN only returns records that match BOTH tables, so to pass the
first INNER JOIN the record need P.SEX = 'Male', and then to pass the
second it would need

P.SEX = 'Female' at the same time. 

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INSERT vs BEGIN

2019-09-03 Thread Richard Damon
On 9/3/19 7:18 AM, Dominique Devienne wrote:
> On Tue, Sep 3, 2019 at 12:03 PM Rob Richardson 
> wrote:
>
>> I didn't know it is possible to insert multiple rows into a table using a
>> command like this.
>
> Added over 7 years ago: See
> https://www.sqlite.org/changes.html#version_3_7_11 #1
>
>
>> Is this just an SQLite feature, or is this part of the SQL standard?
>
> I suspect it's non-standard, since Oracle does not support it. But it's
> just a guess on my part. --DD
Many databases I have used include it, so I thought it was standard (if
not universally supported, but that is somewhat common with SQL)

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] .save always errors: database is locked

2019-08-30 Thread Richard Damon
On 8/30/19 9:11 AM, Alexander Gabriel wrote:
> Hi
>
> I do this:
>
>- restart windows10 or macOS and log in (to guarantee no other process
>is using sqlite3)
>- cd to the folder where sqlite3 v3.29.0 is contained
>- type `sqlite3` (windows 10) or `./sqlite3` (macOS) to start sqlite3
>- type `.open test`, followed by `.save test`
>
> Result: `Error: database is locked`
>
> What am I doing wrong?
> How can I change configuration if I can never save changes?
>
> The only time I can save without an error is when I `.save test2` without
> having opened it before.
>
> Alex

Databases are different than things like Word Documents. When you issue
command that modify the database, the database on disk is immediately
changed (subject to being in a transaction that can be rolled back), so
you don't need to 'save' a database after working with it. The 'Save'
command is basically similar to the 'Save As' command for a document.

Since the database is current open, trying to save back over the
original copy can't be done, as it is open for reading, and also doesn't
need to be done.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Richard Damon
On 8/29/19 10:56 AM, dirdi wrote:
> On 8/29/19 6:10 PM, Keith Medcalf wrote:
>> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you 
>> ever run analyze)?  
> No.
>
>> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run 
>> ANALYZE; rather than reindexing or dropping/recreating the index, what is 
>> the result?
> If I run ...
>
>> ANALYZE;
>> REINDEX;
>> ANALYZE;
> ... instead of ...
>
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>> `int`
>> );
> the UPDATE query remains being slow (~36m).
>
One thought, could there perhaps be something about the query that the
ANALYZE gives bad information about how the structure of data in the
table that dropping and recreating (which deletes the results of the
last ANALYZE) makes the query planner chose a different plan which works
better. Perhaps you could look at and compare the query plans of the two
different scenarios.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Documentation update request

2019-08-15 Thread Richard Damon

> On Aug 15, 2019, at 10:37 AM, Simon Slavin  wrote:
> 
> 
> 
> says that sqlite_master cannot be changed.
> 
> 
> 
> tells you how to change it.
> 

You under quote, the faq says it “can not be changed (except under 
extra-ordinary conditions).”, and those extra-ordinary conditions are a link to 
the second section you mention.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Programming methodology (was DEF CON (wasL A license plate of NULL))

2019-08-14 Thread Richard Damon
On 8/13/19 4:47 PM, Richard Hipp wrote:
> On 8/13/19, Jose Isaias Cabrera  wrote:
>> I see all of you smart programmers using this
>> non-column matching behavior, and I ask myself why?
> Because that's the way Dennis Richie did it.  :-)

There are many ways to format code, and many programmers have a strong
preference to the way THEY want it. In many ways this choice is a bit
like religion, sometimes hard to really explain why, but often the
believer has some ideas about it, and the choice is often firmly held
and hard to change.

I personally like the K style, as it is compact and dense, so you can
see more code. Some people dislike it for much the same reason.

While the braces don't align, the closing brace does align with the
beginning of the statement it is closing, and there is nothing else in
that column between so scanning up to find it isn't that hard (and then
to the end of the line to see the opening brace).

If the distance is long, I will add a comment after the closing brace
describing the start statement to make it easier to match.

Yes, this format makes it harder to see mismatched braces, but by
compiling often you get a syntax error with miss-matched braces, and
letting your editor find matching braces it tends to be fairly quick to
locate it. The key is to compile (or have the editor syntax check) often
enough that you can't make two opposing errors like this that hide each
other.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Backward compatibility vs. new features (was: Re: dates, times and R)

2019-08-11 Thread Richard Damon
On 8/11/19 4:21 PM, Thomas Kurz wrote:
>> I do understand the value of having date/time types in SQLite, but it is not 
>> easy to do while retaining backward compatibility.  It'll have to wait for 
>> SQLite4 or something.
> Actually I do not really understand the point about backward compatibility. 
> Many very useful suggestions are rejected by just citing "backward 
> comatibility".
>
> From my point of view, this is not actually a knock-out-criterium, because:
>
> a) Existing applications would always continue to work, even if using newer 
> versions of sqlite.dll as it should be no problem for any later version that 
> intruduced feature X to continue using any database regardless of whether or 
> not this database actually contains feature X. (This is actual *backward* 
> compatibility.)

The issue for something like a data-time field is how would you indicate
that a field is a data-time field. Due to backwards compatibility it
can't use the term data or time to trigger that use, as existing
applications use that and expect a different result, based on published
and promised rules.

>
> b) New applications could decide whether or not to make use of any new 
> feature.
>
> c) Of course, an existing application doesn't know how to handle database 
> structures with feature X when using an sqlite.dll from the time before this 
> feature has been introduced. (I would, however, call this *forward* 
> compatibility.) This is true, but on the other hand, one might ask why an 
> arbitrary application actually might want to do this? I have often gotten the 
> response that it is up to the app how to handle data when reading from a 
> database (IIRC, DATE as a matter of fact was the topic of the discussion). So 
> one could as well argue that it is the app's responsibility to use up-to-date 
> libraries when accessing databases. (Note that this applies *only* to an app 
> dealing with *foreign* databases where one anyhow needs to know how to 
> interpret data, so this is no knock-out-problem.)
>
> Someone recently posted about SQLite support for the next 31 (or so) years. 
> Actually I hope this doesn't mean we will have to wait for three decades 
> until new features could be implemented...?!
New features can, and have, been added. The key is that they need to use
syntax that previously was an error (or at least without defined
meaning) to implement it.
>
> Maybe a new subpage could be added to the website, named "proposed features" 
> or similar, just listing what has been proposed including some short 
> description. There have been many great ideas and it would be a pity if they 
> got lost in the depths of the mailing list ;)
>
> Just my 2cts
> Thomas

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: Explicit "read transaction" with journal_mode=WAL.

2019-07-31 Thread Richard Damon
If, and it is possible for there to be sequencing to enforce it, You KNOW that 
step 1, get a read transaction (which the OP presumes implies includes getting 
the lock) occurs before step 2, and thus step 4 seeing changes from step 3 says 
something is wrong.

Yes, if you only can use the database to communicate, this may not be possible, 
but if the program does the get a read transaction itself, and only after 
getting the confirmation starts the process that does the write, it can 
positively know that it ‘started’ the read transaction before the write 
transaction was started. 

SQLite (I believe) documents that the BEGIN statement doesn’t set the read lock 
until the select, so its behavior is as documented, just not as the OP desires. 
For them, it is desired that the beginning of the transaction also gets the 
lock, so the their API point of begin a transaction marks the point that the 
read data is locked to.

One option, that they are pursuing (but seems to have been denied) is to have 
SQLite have a variant of BEGIN that includes getting the read lock. Their other 
option is to just do a dummy SELECT on the database to force getting the lock. 
The callers to their API won’t be able to tell the difference, except maybe a 
bit of execution time.

> On Jul 31, 2019, at 4:46 PM, Igor Tandetnik  wrote:
> 
>> On 7/31/2019 12:32 PM, test user wrote:
>> In some runtimes, the scheduling of functions is unpredictable, so although
>> you will not have a `sleep 5` in the code, the runtime can produce this
>> effect on loaded systems or with programs with long running sync functions.
>> An example of how you might use this:
>> - Connection 1: Get a read transaction.
>> - Connection 2: Get a write transaction, write a lot of data, commit.
>> - Connection 2: SELECT report summary B.
>> - Connection 1: SELECT report summary A,
>> - Diff A and B to see what changed.
> 
> Suppose you discovered that B and A are in fact the same. How do you know 
> whether that occurred because a) "get a read transaction" is "broken" in that 
> it doesn't actually acquire the lock as you expected, or because b) 
> Connection 2 just happened to write lots of data and commit before connection 
> 1 obtained a read transaction?
> 
> In other words, in your example A == B is possible even if BEGIN worked the 
> way you expect it to work, and grabbed a read lock immediately. Similarly, A 
> != B is possible with BEGIN working the way it does now, if SELECT on 
> Connection 1 just happens to beat the write on Connection 2. It's a matter of 
> timing and scheduling, which you yourself posit is unpredictable.
> 
> So, since both A==B and A!=B are possible with either behavior of BEGIN, why 
> again do you care how BEGIN behaves?
> -- 
> Igor Tandetnik
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] storing blobs in a separate table

2019-07-31 Thread Richard Damon
On 7/31/19 7:49 AM, Rael Bauer wrote:
> Hi,
>
> I am aware that in sqlite for a given "entity" one should generally
> store blobs in a separate table to the other standard fields (such as
> text/number etc..)
>
> So for maximum efficiency, in such a blob table, is there a problem
> storing multiple blob fields?
>
> E.g.  data1, data2, picture, etc..
>
> or should the blob table only have 1 blob field? (perhaps with a
> second field indicating what is stored in the blob, or store different
> blobs in different tables?)
>
> Thanks
>
> Rael
>
The Reason for making a separate Blob table is to speed up fetching
records. Blobs tend to be big, and thus they tend to reduce the number
of records that fit in a page, making searching for a record slower, and
if you need data after the blob, it needs to read through the blob to
get to it

If you have a table with two (or more) blobs, and for some accesses you
want one of them, and for other accesses you want the other, then
putting them into different tables means you don't need to read the
first blob when all you want is the second. If you most of the time want
both of them, then putting them in the same table means once it has
found the first it also has the second. Thus same or different tables
depends on how you will be using them.

Also, this suggestion is for blobs that are somewhat large, whose data
noticeably increases the size of the record, For very small blobs, it
may be better to keep them in the main table.

It is always good to know the WHY behind these rules of thumbs, so you
know how to apply them.


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: Explicit "read transaction" with journal_mode=WAL.

2019-07-31 Thread Richard Damon
As has been pointed out, your function that is called can do the BEGIN
and then a SELECT that hits the database to force the obtaining of the
read lock. The fact that the BEGIN didn't get the lock is then not seen
by the users of your API.

IF at some point a new BEGIN SHARED IMMEDIATE operation becomes
available, you can change you function's internals, and the caller
doesn't know the difference except that the call got perhaps a bit faster.

On 7/31/19 6:36 AM, test user wrote:
> Quote: What importance does it have for you that it already holds an
> "end-mark?
> Quote: Why would it matter that a writer did write and commit between the
> "reader" BEGIN and its first read?
>
> Im writing a library and would like to have an API where the "read
> transaction" has a clear beginning in time.
>
> BEGIN IMMEDIATE forces a "write transaction", but there is no counterpart
> for a "read transaction".
>
> As an example, the client of this library could:
>
> - A. Obtain a "read transaction", *without running any SELECTs*.
> - B. Complete 20 write transactions in another process.
> - C. Begin reading from the read transaction (A) at the point before the
> transactions had occurred.
>
>
> At the moment, a "read transaction" is only started on the first SELECT.
>
> If a client tries to start a "read transaction" with BEGIN, and that
> returns SQLITE_OK, its not clear that this has not actually begun any
> transaction until the first SELECT query.
>
> This would enable an API like:
>
> const r = await db.startReadTx();
> const w = await db.startWriteTx();
>
> // At this point in the runtime it clear when the transactions have begun,
> and how they will impact other concurrent read/write transactions.
>
>
>
> On Tue, Jul 30, 2019 at 11:40 PM Olivier Mascia  wrote:
>
>>> Le 31 juil. 2019 à 00:22, Keith Medcalf  a écrit :
>>>
>>> I can see where a BEGIN IMMEDIATE SHARED would be useful in non-WAL mode
>> though.  I will grant that there may be cases where it might be useful in
>> WAL mode, even though I cannot think of any.
>>
>> Fully agree.
>>
>> —
>> Best Regards, Meilleures salutations, Met vriendelijke groeten, Mit besten
>> Grüßen,
>> Olivier Mascia
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: Explicit "read transaction" with journal_mode=WAL.

2019-07-30 Thread Richard Damon
On 7/30/19 5:33 PM, Olivier Mascia wrote:
>> Le 30 juil. 2019 à 22:39, test user  a écrit :
>>
>> What I would like is something like BEGIN READ, which will not block
>> writers for its duration.
>>
>> This "read transaction" can see all committed transactions that happened
>> before it, but none after it.
>>
>> At the moment it seems to get this guarantee I will need to do a "BEGIN;
>> SELECT * from sqlite_master LIMIT 1"
> I would drop the "SELECT * from sqlite_master LIMIT 1" or whatever other 
> dummy read.  As long as your explicit transaction, started with "BEGIN" which 
> is a shortcut for "BEGIN DEFERRED", does not actually start reading anything, 
> what importance does it have for you that it already holds an "end-mark"? 
> (See link below for reference to the "end mark" concept).  As soon as it 
> reads anything, it will see the database from that point on, until ROLLBACK 
> or COMMIT, as it was at point of that single first read.
>
> https://www.sqlite.org/wal.html#concurrency
IF the only interaction the program has with other parts is via the
database, then yes, it shouldn't be important exactly when the program
'locks' it view of the database. On the other hand, if there might be
other communication channels open, then it might be important that the
program has at a particular point before it really needs to access the
database that it establish its 'end-mark'. Doing the SELECT is one way
to do that, adding some variation of BEGIN that immediately starts a
SHARED lock would be another.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: [EXTERNAL] Determining valid statement placeholders.

2019-07-22 Thread Richard Damon
But he was to be able to provide the args as {‘0’:’one, ‘10’:’ten’} and since 
he only uses positional rags 1 and 10 that it be OK, i.e. positional args are 
treated as key word args, and only those actually used are needed. THAT can’t 
be provided by the API.

> On Jul 22, 2019, at 9:22 AM, Keith Medcalf  wrote:
> 
> 
> I don't see what is so hard.  APSW does it:
> 
>> python
> Python 2.7.16 (v2.7.16:413a49145e, Mar  4 2019, 01:30:55) [MSC v.1500 32 bit 
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
 import apsw
 db = apsw.Connection('')
 db.execute('select ?, ?10;', ('one', 'ten')).fetchone()
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "C:\python\lib\site-packages\Local\newapsw.py", line 39, in execute
>return self.cursor().execute(*args, **kwargs)
> apsw.BindingsError: Incorrect number of bindings supplied.  The current 
> statement uses 10 and there are 2 supplied.  Current offset is 0
 
> 
> The statement required 10 parameters and only 2 were supplied.  ERROR!
> 
> Internally it uses sqlite3_bind_parameter_count to find out how many 
> parameters need binding and requires that the number of "parameters" passed 
> match the number of parameters "expected" when binding positionally.
> 
 db.execute('select :1, :10;', {'2': 'one', '10': 'ten'}).fetchone()
> Row(_0=None, _1=u'ten')
 db.execute('select :1, :10;', {'1': 'one', '10': 'ten'}).fetchone()
> Row(_0=u'one', _1=u'ten')
 
> 
> When binding by name, it looks up the names in the provided dictionary and 
> binds those it finds.
> 
> You can also bind named parameters to a positional list (in which case it is 
> the programmers job to keep track of what they are doing) since a named 
> parameter is merely syntactic sugar on top of positional parameters:
> 
 db.execute('select :1, :10;', ('one', 'two')).fetchone()
> Row(_0=u'one', _1=u'two')
 
> 
> It is not very difficult.  You call sqlite3_bind_parameter_count.  Then if 
> you are binding positionally you make sure there are sufficient positional 
> parameters provided to bind them all.  It you are binding by name, you cycle 
> through the parameters, get the name, and then bind the given named parameter 
> to that parameter.  apsw chooses to ignore missing items when binding by name 
> -- sounds like you simply want to ERROR instead ...
> 
> The only issue I can see is that when requesting the name of an positional 
> parameter that has no name it returns null rather than the positional name, 
> however, this is pretty easy to work around in pretty much any programming 
> language ... that is if sqlite3_parameter_name(stmt, x) return null then the 
> name is a ? followed by x in decimal (sprintf("?%d", x) or thereabouts having 
> appropriate accomodations for the language syntax and buffer safety, etc)
> 
> However, you cannot bind to positional parameters by name:
> 
 db.execute('select ?1, ?10;', {'2': 'one', '10': 'ten'}).fetchone()
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "C:\python\lib\site-packages\Local\newapsw.py", line 39, in execute
>return self.cursor().execute(*args, **kwargs)
> apsw.BindingsError: Binding 1 has no name, but you supplied a dict (which 
> only has names).
> 
> 
> *the builtin python sqlite3 wrapper does the same thing I expect (though I 
> have never actually looked) but it is somewhat braindead otherwise, so I 
> don't use it and haven't bothered to test what it does in these 
> circumstances.  Though my expectation is that it would behave somewhat 
> similarly.
> -- 
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
> lot about anticipated traffic volume.
> 
> 
>> -Original Message-
>> From: sqlite-users [mailto:sqlite-users-
>> boun...@mailinglists.sqlite.org] On Behalf Of test user
>> Sent: Monday, 22 July, 2019 06:36
>> To: SQLite mailing list
>> Subject: Re: [sqlite] [EXTERNAL] Determining valid statement
>> placeholders.
>> 
>> I understand the problem, but I want my library to be able to detect
>> the
>> problem programatically.
>> 
>> Currently if a user mixes index-based and key-based placeholders, the
>> only
>> thing a library using SQLite can do us run the query with unbound
>> placeholders set to null (the SQLite default).
>> 
>> Id like the ability to instead throw an error in this case.
>> 
>> 
>> I think SQLite internally knows how many placeholders are in the
>> query at
>> parse time.
>> 
>> My question is how can I get the data via the API, or if it would be
>> considered to add a function to get this data?
>> 
>> 
>> On Mon, Jul 22, 2019 at 12:29 PM Keith Medcalf 
>> wrote:
>> 
>>> 
>>> On Monday, 22 July, 2019 04:34, Enzo 
>> wrote:
>>> 
 It is not the same information.
>>> 
 I want to be able to determine "has the user bound all placeholder
 values with data?".
>>> 
 The user provides this as input:
>>> 
 query="SELECT ?, ?10"
 data={"0": 

Re: [sqlite] [SPAM?] Re: [EXTERNAL] Determining valid statement placeholders.

2019-07-22 Thread Richard Damon
The problem is you are defining your problem differently than SQLite does, so 
it can’t help you.

To SQLite, a query with ?10 in it has (at least) 10 positional parameters, and 
if you are defining that the user needs to explicitly provide values for all 
parameters, (by SQLite) that means they will need to provide positional 
parameters 1 - 10 (even if some are not used). 

You seem to want to redefine that in this case, if a parameter isn’t actually 
used, then it doesn’t need to be provided. To do this, my guess is you are 
going to need to parse the SQL yourself (at least partially) to detect what 
parameters are used.

Note, perhaps part of the issue is that you seem to want to call ?10 as a key 
based placeholder, but to SQLite it is an index based placeholder, and implies 
that indexes 1 through 9 also exist.

> On Jul 22, 2019, at 8:36 AM, test user  wrote:
> 
> I understand the problem, but I want my library to be able to detect the
> problem programatically.
> 
> Currently if a user mixes index-based and key-based placeholders, the only
> thing a library using SQLite can do us run the query with unbound
> placeholders set to null (the SQLite default).
> 
> Id like the ability to instead throw an error in this case.
> 
> 
> I think SQLite internally knows how many placeholders are in the query at
> parse time.
> 
> My question is how can I get the data via the API, or if it would be
> considered to add a function to get this data?
> 
> 
>> On Mon, Jul 22, 2019 at 12:29 PM Keith Medcalf  wrote:
>> 
>> 
>>> On Monday, 22 July, 2019 04:34, Enzo  wrote:
>>> 
>>> It is not the same information.
>> 
>>> I want to be able to determine "has the user bound all placeholder
>>> values with data?".
>> 
>>> The user provides this as input:
>> 
>>> query="SELECT ?, ?10"
>>> data={"0": "data-a", "10": "data-b"}
>> 
>>> Note: This IS valid, as they have provided all data for placeholders.
>> 
>>> Using the APIs you mentioned:
>> 
>>> count=10
>>> name(10) = "?10"
>> 
>>> So according to the API, the indexes [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> are placeholders in the query.
>> 
>>> As you can see, only indexes [1, 10] are valid placeholders in the
>>> query.
>> 
>>> So, as far as I can tell, it is not possible to get this from the
>>> API:
>> 
>>> query="SELECT ?, ?10"
>>> valid_placeholder_indexes=[1, 10]
>> 
>>> It is only possible to get this:
>> 
>>> query="SELECT ?, ?10"
>>> maybe_placeholder_indexes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>> 
>> The fundamental issue is that you are confusing POSITIONAL parameters with
>> NAMED parameters.  ?nnn where nnn is numeric describe POSITIONAL
>> parameters, so in the query SELECT ?, ?10 you are using positional
>> parameter 1 and positional parameter 10.  That implies that there are 10
>> positional parameters.  The fact that you are only using those positional
>> parameters sparsely (only using positional parameter 1 and positional
>> parameter 10) does not mean that you are using two parameters, it means
>> that you are using 10, but only referencing 2 of them.
>> 
>> Said another way, you are declaring a function that looks like this:
>> 
>> int stmt(a, b, c, d, e, f, g, h, i, j)
>>   return a + j
>> 
>> and expecting to be able to call it as
>> 
>> z = stmt(1, 6)
>> 
>> expecting some magic to know that the second parameter is really parameter
>> 10.
>> 
>> https://www.sqlite.org/lang_expr.html#varparam
>> 
>> There may be many positional parameters (like 999 in the default build)
>> and sqlite3_bind_parameter_count returns the "index" of the greatest
>> parameter number used in the statement.  Having created 10 parameters
>> sqlite has absolutely no clue that you happen to be using only parameter 1
>> and parameter 10.  If you only needed 2 parameters you should have only
>> created 2, not 10.  That is what NAMED parameters are for.
>> 
>> If you change from using positional (?) parameters to using named (:) or
>> (@) what happens?
>> 
>> select :1, :10; should only create 2 parameters named :1 and :10 ...
>> 
>> --
>> The fact that there's a Highway to Hell but only a Stairway to Heaven says
>> a lot about anticipated traffic volume.
>> 
>> 
>> 
>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I can insert multiple rows with the same primary key when one of the value of the PK is NULL ...

2019-07-19 Thread Richard Damon
One big issue is that in general (as I remember right) pragmas generally affect 
the connection, not the database itself, so shouldn’t change how the schema is 
interpreted, or another connection (or before issuing the pragma) might 
interpret things differently and possibly see the database as somehow corrupt 
or be able to corrupt the database.

> On Jul 19, 2019, at 1:44 PM, Thomas Kurz  wrote:
> 
> Imho it would be helpful (especially for newbies that don't know the full 
> history of SQLite) to have a "PRAGMA strict_mode" or similar, to disable all 
> kinds of historical bugs. They might be relevant for existing applications 
> but in no way for newly created ones. Among the things to consider should be:
> 
> - PRIMARY KEY = UNIQUE NOT NULL (what I have learned now *g*)
> - enable FOREIGN KEY constraints (I know there is already a pragma, but imho 
> it should be included)
> - strict type enforcement
> - disable the use of double quotes for strings
> - default to WITHOUT ROWID
> 
> ...and probably many more I don't know about ;-)
> 
> 

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: Grammar police

2019-07-11 Thread Richard Damon
On 7/11/19 3:45 PM, Ned Fleming wrote:
> On 2019-07-11 2:31 PM, Carl Edquist wrote:
>>> Ginger tells me that "a historical" is technically correct,
>>
>> AFAICT, "an historical" is correct iff the "h" in "historical" is
>> silent.
>>
>> Eg, "It's an 'istorical oversight to pronounce the 'h' in 'historical'."
>>
>
> From the New Oxford American Dictionary entry for "an" --
>
> "usage: Is it ’a historical document’ or ’an historical document’? ‘A
> hotel’ or ‘an hotel’? There is still some divergence of opinion over
> which form of the indefinite article should be used before words that
> begin with h- and have an unstressed first syllable. In the 18th and
> 19th centuries, people often did not pronounce the initial h for these
> words, and so an was commonly used. Today the h is pronounced, and so
> it is logical to use a rather than an. However, the indefinite article
> an is still encountered before the h in both British and American
> English, particularly with historical: in the Oxford English Corpus
> around a quarter of examples of historical are preceded with an rather
> than a."
>
> An is fading in this usage but certainly still acceptable.
>
Yes, the a / an distinction is largely based on the following sound,
'an' if it is a vowel, 'a' if it is not (the n sound breaks up the
double vowel). words that begin with an initial unstressed h-vowel might
not have the h really vocalized, so the following sound is vowelish, so
it takes 'an' (there isn't enough h to break up the vowel cluster). This
can largely be affected by the accent one talks with (which can be
related but different than the dialect).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: [SPAM?] Is WAL mode serializable?

2019-07-10 Thread Richard Damon
On 7/10/19 7:54 AM, Andy Bennett wrote:
> Hi,
>
>>> Thanks for the detailed explanation of how it works. What I meant to
>>> ask was "is there really a difference in the *semantics*?". i.e. from
>>> the user perspective, can databases in the two different modes end up
>>> in different states?
>>
>> My understanding is NO, as long as the same transactions complete.
>
> Thanks! That explanation is really great.
>
>
>> In both cases, if a read transaction attempting to upgrade to a write
>> transaction gets a busy, it needs to end the transaction and re-do its
>> reads before it can do its write.
>
> This is assuming that you have the defauly busy handler engaged?
>
> Best wishes,
> @ndy
>
A proper busy handler will wait when it make sense to get the needed
locks, so you don't get spurious a spurious busy that you could just
retry and be able to continue.

The fundamental issue would be a process that starts with just a read
lock, and after reading some data realizes that it needs to update
something based on what it read. If when upgrading to the write lock,
you get a busy that tells you that you will NEVER be able to upgrade,
then you need to close the transaction, and when you restart, you should
forget the previous data you read (at least as it relates to what you
might want to write) as it might have changed, so you need to read it
again to be sure.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: Is WAL mode serializable?

2019-07-10 Thread Richard Damon
On 7/10/19 7:05 AM, Andy Bennett wrote:
>
> Thanks for the detailed explanation of how it works. What I meant to
> ask was "is there really a difference in the *semantics*?". i.e. from
> the user perspective, can databases in the two different modes end up
> in different states?

My understanding is NO, as long as the same transactions complete.

There is a significant difference in when a write transaction can start,
so code that doesn't wait properly might take different paths and thus
you end up in a different state.

In WAL mode, basically one write transaction can always be started, no
matter how many read transactions might be present, and once a write
transaction starts, no other existing read transaction can upgrade
itself to a write.

In non-WAL mode, the write transaction will need to wait for all (other)
read transactions to end before it can start, and if while it is waiting
another read transaction tries to upgrade it will be denied, but once it
starts, there can't be another read transaction to attempt an upgrade.

In both cases, if a read transaction attempting to upgrade to a write
transaction gets a busy, it needs to end the transaction and re-do its
reads before it can do its write. The differences is that in WAL mode,
one transaction doesn't need to wait for all the reads to finish before
it can start.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Should SQLite distinguish between +0.0 and -0.0 on output?

2019-06-14 Thread Richard Damon
On 6/14/19 7:15 AM, R Smith wrote:
>
> On 2019/06/14 4:23 AM, Richard Damon wrote:
>> On 6/13/19 10:51 AM, R Smith wrote:
>>> On 2019/06/13 4:44 PM, Doug Currie wrote:
>>>>> Except by the rules of IEEE (as I understand them)
>>>>>
>>>>> -0.0 < 0.0 is FALSE, so -0.0 is NOT "definitely left of true zero"
>>>>>
>>>> Except that 0.0 is also an approximation to zero, not "true zero."
>>>>
>>>> Consider that 1/-0.0 is -inf whereas 1/0.0 is +int
>>>
>>> I do not know if this is the result case in any of the programming
>>> languages, but in Mathematical terms that is just not true.
>>>
>>> 1/0.0 --> Undefined, doesn't exist, cannot be computed, Should error
>>> out. Anything returning +Inf or -Inf is plain wrong.
>>> I posit the same holds true for 1/-0.0
>> Yes, 1.0/0.0 is undefined in the Field of Real numbers, but IEEE isn't
>> the field of Real Numbers. First, as pointed out, it has limited
>> precision, but secondly it have values that are not in the field of Real
>> Numbers, namely NaN and +/-Inf.
>>
>> Note, that with a computer, you need to do SOMETHING when asked for
>> 1.0/0.0, it isn't good to just stop (and traps/exceptions are hard to
>> define for general compution systems), so defining the result is much
>> better than just defining that anything could happen. It could have been
>> defined as just a NaN, but having a special 'error' value for +Inf or
>> -Inf turns out to be very useful in some fields.
>
> I wasn't advocating to do something weird when the value -0.0 exists
> in memory - the display of that is what the greater idea behind this
> thread is[**].
>
> What I was objecting to, is claiming (in service of suggesting the
> use-case for -0.0), that the mathematical result of 1/-0.0 IS in fact
> "-Inf" and so computers should conform, when it simply isn't, it's an
> error and SHOULD be shown so. Neither is the mathematical result of
> 0/-1 = -0.0. It simply isn't mathematically true (or rather, it isn't
> distinct from 0.0), and I maintain that any system that stores -0.0 as
> the result of the computation of 0/-1 is simply doing so by virtue of
> the computational method handling the sign-bit separate from the
> division and being able to store it like so by happenstance of IEEE754
> allowing -0.0 as a distinct value thanks to that same sign bit, and
> not because it ever was mathematically necessary to do so.
>
> I'll be happy to eat my words if someone can produce a mathematical
> paper that argued for the inclusion of -0.0 in IEEE754 to serve a
> mathematical concept. It's a fault, not a feature.
>
>
> [** As to the greater question of representation - In fact I'm now a
> bit on the fence about it. It isn't mathematical, but it does help
> represent true bit-data content. I'm happy with it both ways.]

I was pointing out that it depends on WHICH type of mathematics you are
talking about what is the proper result of 1/0. If you have your mind
wrapped around the idea the 'Floating Point' == 'Real Numbers', then it
doesn't make sense, but it is a best a rough approximation, expressing
not all of the Reals, but also expressing some things that are outside
the domain of the Reals. A simple example, 1.0 / 3.0 * 3.0 - 1.0 should
be exactly 0.0 in the domain of real numbers. It will NOT be in the
domain of Floating Point numbers (because 1.0 / 3.0 can not be exactly
represented). You can't even say the results will be 'close' to zero, as
there is no expressible tolerance based just on the final expected
answer, as you could replace the 1.0 with a billion, or a billionth, and
get very different values, all of which would need to be considered 'close'.

My understanding is that IEEE COULD have defined 1/0 as NaN instead, but
there were significant areas of numerical calculation where remembering
the infinity, and then using the fact that n / inf is 0 gave meaningful
answers in some cases. (or the atan(inf) = pi/2).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: [EXTERNAL] Re: Should SQLite distinguish between +0.0 and -0.0 on output?

2019-06-13 Thread Richard Damon
On 6/13/19 10:51 AM, R Smith wrote:
> On 2019/06/13 4:44 PM, Doug Currie wrote:
>>>
>>> Except by the rules of IEEE (as I understand them)
>>>
>>> -0.0 < 0.0 is FALSE, so -0.0 is NOT "definitely left of true zero"
>>>
>> Except that 0.0 is also an approximation to zero, not "true zero."
>>
>> Consider that 1/-0.0 is -inf whereas 1/0.0 is +int
>
>
> I do not know if this is the result case in any of the programming
> languages, but in Mathematical terms that is just not true.
>
> 1/0.0 --> Undefined, doesn't exist, cannot be computed, Should error
> out. Anything returning +Inf or -Inf is plain wrong.
> I posit the same holds true for 1/-0.0 

Yes, 1.0/0.0 is undefined in the Field of Real numbers, but IEEE isn't
the field of Real Numbers. First, as pointed out, it has limited
precision, but secondly it have values that are not in the field of Real
Numbers, namely NaN and +/-Inf.

Note, that with a computer, you need to do SOMETHING when asked for
1.0/0.0, it isn't good to just stop (and traps/exceptions are hard to
define for general compution systems), so defining the result is much
better than just defining that anything could happen. It could have been
defined as just a NaN, but having a special 'error' value for +Inf or
-Inf turns out to be very useful in some fields.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: [EXTERNAL] Re: Should SQLite distinguish between +0.0 and -0.0 on output?

2019-06-13 Thread Richard Damon
On 6/13/19 1:29 AM, Hick Gunter wrote:
>> 2.  Math.  Negative zero is not a mathematical concept.
> -0.0 is "too close to distinguish from, but definitely left of true zero", 
> which may be significant in some context
>
Except by the rules of IEEE (as I understand them)

-0.0 < 0.0 is FALSE, so -0.0 is NOT "definitely left of true zero"


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: round function inconsistent

2019-05-27 Thread Richard Damon
On 5/27/19 11:36 AM, Jose Isaias Cabrera wrote:
> James K. Lowden, on Sunday, May 26, 2019 04:51 PM, wrote...​
>> On Fri, 24 May 2019 13:10:49 +​
>> Jose Isaias Cabrera  wrote:​
>> ​
>>>> Consider these two queries:​
>>>> ​
>>>>   SELECT round(3.255,2);​
>>>>   SELECT round(3.2548,2);​
>>>> ​
>>>> Do you expect them to give different answers?​
>>> ​
>>> 3.26​
>>> 3.25​
>> ​
>> sqlite> SELECT cast(100 * (0.005 + 3.2548) as​
>>   ...> integer)/100.0;​
>> 3.26​
> ​
> Ok, I think it happens even before the casting.  This should be, 
> 3.2598, and yet, it's 3.26.​
> ​
> 11:23:07.55>sqlite3​
> SQLite version 3.28.0 2019-04-16 19:49:53​
> Enter ".help" for usage hints.​
> Connected to a transient in-memory database.​
> Use ".open FILENAME" to reopen on a persistent database.​
> sqlite> SELECT 0.005 + 3.2548;​
> 3.26​
> ​
> I even tried casting it as real,​
> ​
> sqlite> SELECT cast(0.005 + 3.2548 as real);​
> 3.26​
> sqlite> SELECT cast(3.2548 + 0.005 as real);​
> 3.26​
> sqlite> SELECT cast((3.2548 + 0.005) as real);​
> 3.26​
> sqlite>

One thing to watch out when using floating point numbers is that none of
the mentioned numbers in this example are exactly what was typed, so
everything has internal rounding adjustments to them, and even the
output has been rounded and doesn't express the exact number that was
the result (There is no value 3.26 as an IEEE floating point number).

IEEE floating point can only represent numbers that can we written as a
integer times a integer power of 2. Most decimal numbers can not be, due
to the factor of 5 in the decimal base of 10. Only if the decimal part
of the number is a multiple of 5 to the power of the number of decimal
digits is the number possibly representable (there is still a precision
limit).

A quick test for this is that all floating point values that are exactly
representable are either:

Integer, or numbers that there decimal ends if 5

If they have more than 1 decimal digit, then they end in 25 or 75 (or 50)

if they have more than 2 decimal digits, then they end in 125, 375, 626,
875 (or 500, 250, 750)

You can extend these patterns to more digits, but these flag over 99% of
decimals as not having an exact representation, so tend to be good enough.

The ultimate answer is that when dealing with floating point, you REALLY
need to KNOW what you are doing or you can easily get answer that seem
wrong (but actually follow the rules).

This is why many business application and the supporting languages for
them adopted a 'Decimal' number that stores numbers with decimal
fractions (base 10), to get around the fact that as people we are used
to thinking in numbers base 10.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to set access permissions to protect a database file?

2019-05-27 Thread Richard Damon
On 5/27/19 9:16 AM, Simon Slavin wrote:
> On 27 May 2019, at 3:33am, Adrian Ho  wrote:
>
>> The OP wants *all users* to be able to update (write) the DB via the Tcl 
>> script reading_room.tcl, but *not* by (say) running the SQLite shell or 
>> something else. In your setup, as long as a specific user has write 
>> permissions, *every program* the user runs can write to the DB.
> Some work in this thread has been because the file is a SQLite database.  But 
> it's not really a SQLite question.  It's more about the access/permissions 
> model of Debian 9.  What OP wants can be reduced to a simpler situation:
>
> "I have a text file.  It's on a computer running Debian 9.  I want to make 
> sure that this text file can be read/written by multiple people, but that it 
> can be read/written only using this program I wrote."
>
> I've never used Debian so I can't solve the problem.  But from what little I 
> remember of Linux, one solution is to create a special account for that one 
> operation, and set up that account in a non-standard way.

Actually, It can be an SQLite question, as another way to solve the base
problem is to do something to the database so that only 'authorized'
applications can access/modify it. One way to do that is to use the SEE
extension and encrypt the database. If you do that then you no longer
have the equivalent of a 'text file', so unauthorized applications can't
access the file.

It isn't perfect protection, because someone still will have the ability
to delete/overwrite the file, to protect from that seems to need the
protection method you describe, but if you are only trying to protect
against Murphy, and not Machiavelli (as the saying goes) it may be a
viable, and portable, solution.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: round function inconsistent

2019-05-24 Thread Richard Damon
On 5/24/19 8:35 AM, Jose Isaias Cabrera wrote:
> If IEEE754 can't figure out how to round 3.555 to 2 decimals, I don't know if 
> I'd trust it. ;-) thanks.

Just to point out a simple fact that seems to have been glossed over a
bit. In IEEE754 there is NO number with a value of exactly 3.555 (or
3.56), as neither of these numbers can be expressed as an integer value
times a power of 2. It defines that given these numbers as input
requests, the conversion routine should generate one of the two numbers
that can be exactly represented nearest to that value (one being
something that begins 3.55499... and the other that begins 3.55500...),
and preferably the one that is closest. Depending on which of these two
numbers was chose to represent the value will affect which value the
proper answer should be, which would be eith 3.55 or 3.56, and then you
run into the issue that neither of THOSE values can be exactly
represented, so the final answer will be the number either immediately
below or above that value.

Now, you also can't just go by what is printed for those numbers as
usually the default format, even if you don't explicitly round, does
some rounding to avoid a lot of 9s being outputted.

One way to show this issue is to do something like  (3.555*1000)-3555,
which if 3.555 could be exactly represented should be 0, but if the
calculation is done in binary floating point, it won't be. (The 1000 and
3555, being reasonable sized integers can be exactly represented).

This isn't so much a problem with IEEE754, but a fundamental problem
with binary floating point and decimal numbers, which is why many
business problems don't use floating point, but decimal arithmetic or
scaling things to keep the important digits as integers (do you math in
pennies or a fraction of a penny, so the binary issue isn't an issue any
longer, since 0.5 is a number that IS exactly representable, so the
rounding point is exact).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Getting the week of the month from strftime or date functions

2019-05-06 Thread Richard Damon
On 5/6/19 1:58 PM, Jose Isaias Cabrera wrote:
> Petite Abeille, on Sunday, May 5, 2019 09:10 AM, wrote...
>>> On May 4, 2019, at 21:24, Thomas Kurz  wrote:
>> True enough, even though one could convert a 'week of year' into a 'week of 
>> month':
> [clip]
>
>> 2019-11-30|2019|11|47|5
>> 2019-12-01|2019|12|47|1  <--
>> 2019-12-02|2019|12|48|2
>> 2019-12-03|2019|12|48|2
>> 2019-12-04|2019|12|48|2
> Not that I want to continue with this subject, but something is wrong where 
> one week only has 1 day.  I know all the arguments about dates and countries 
> and, etc., but we have discover DNA; shouldn't we have the knowledge to come 
> up with a dating system that should work for the world. :-)  Yes, I know.  
> It's probably why we still have wars.  But, I am one of those that believes 
> that someday, as John Lennon sang, "...the world will live as one." ;-)
>
> Happy dating...
>
> josé
It depends a lot on how you want to define a 'week' and what you are
going to use it for. If printing a traditional calendar, a one day week
makes a lot of sense. The first week is the first row of the calendar,
the second week is the second row, and so on.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problem with REAL PRIMARY KEY

2019-05-04 Thread Richard Damon
On 5/4/19 12:36 PM, Manuel Rigger wrote:
> Hi everyone,
>
> Consider the following example:
>
> CREATE TABLE t1 (c0, c1 REAL, PRIMARY KEY (c1, c0));
> INSERT INTO t1(c1) VALUES (0X7ff);;
> SELECT ALL * FROM t1 WHERE c1 = 5.76460752303423e+17;
>
> I would expect the row to be fetched, which is not the case.
>
> I confirmed that the real value stored is indeed equal to the constant:
>
> sqlite> SELECT *, typeof(c1) FROM t1;
> |5.76460752303423e+17|real
>
> This is not only the case when using a literal, but also when querying the
> value stored in the row in a sub query. So the following expression also
> does not fetch the row:
>
> SELECT * FROM t1 WHERE c1 IN (SELECT c1 FROM t1);
>
> It seems that the PRIMARY KEY causes the bug. When it is removed (or one of
> the columns), the row is returned as expected.
>
> Best,
> Manuel

5.76460752303423e+17 == 576460752303423000 which is even
0X7ff is odd
They can't be the same value!

Now, depending on how you convert things, they may be close enough to round to 
the same value if the calculation ends up being done as a real, but that is 
down in implementation details (a 'Double' has less than 59 significant bits, 
so can not express the value 0x7ff)

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Help with sqlite3_value_text

2019-04-12 Thread Richard Damon


> On Apr 12, 2019, at 12:58 PM, x  wrote:
> 
> I’ve been asking myself if I could have done the above more efficiently as 
> sqlite’s converting the original string then I’m converting it and copying 
> it. While thinking about that I started to wonder how c++ handled utf8/16. 
> E.g. To access the i’th character does it have to rattle through all previous 
> I-1 characters to find the start of character i, how pointer arithmetic was 
> handled when pointing to utf8/16 chars etc.
> 

Basically, if you are dealing with a variable width encoding (UTF-8/UTF-16), 
then finding the nth character requires scanning the string counting beginning 
of characters. If this is an important operation, you pay the cost of 
conversion and work in UCS-4. On the other hand, UTF-8 has a lot of nice 
properties such that it can be a fairly seamless upgrade for processing plain 
ASCII text, and if reasonably efficient for typical text. (There are a number 
of complications if you try to support ALL of Unicode, like the composed 
characters, where you use several code-point together to define a single 
character), where you need to decide how you want to normalize and need some 
big character tables for the instructions of how to do this.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Remove row to insert new one on a full database

2019-04-05 Thread Richard Damon
On Apr 5, 2019, at 12:31 PM, James K. Lowden  wrote:
> 
> On Fri, 5 Apr 2019 15:45:10 +0300
> Arthur Blondel  wrote:
> 
>> The data is always the same. That's why removing one row should be
>> enough to insert a new one.
>> My problem is that some times I need to remove many rows to add one
>> new one.
> 
> SQLite *could* avoid that problem by pre-allocating space in the
> journal sufficient to permit a single row to be deleted.  But it's not
> obvious to me that the complexity is worth it, given the size of disks
> these days and consequent rarity of the problem.  
> 
> If I were in your shoes, I'd consider maintaining a "dummy" file that's
> expendable in the event of a SQLITE_FULL error.  
> 
> Compute how much space SQLite needs to delete a row.  Maybe double that
> for safety's sake. Create a file that size, and fill it with deadbeef
> just to be sure.  Write functions to create and delete that file,
> because you'll want to do it consistently.  
> 
> When you encounter SQLITE_FULL, delete the file, do the deed, and
> re-create the file.  If you can't recreate the file, you have an
> unrecoverable error, but an intact database.  
> 
> It's not a perfect solution.  To guard against other processes seizing
> the space while you're trying to use it, you'd have to wall off the
> space, maybe with a loopback filesystem.  But it'd get you further down
> the road than you are now.  
> 
> --jkl
> 
First, the OP has indicated that the FULL message isn’t because the disk is out 
of space, but they have done something to put a hard limit on the size of the 
database, so there is room to create the journal to delete the row as there is 
room for the journal file.

Second, I am not sure SQLite can now exactly how much space will be needed to 
delete any arbitrary row in the database (or at least be able to figure it out 
cheaply). Remember it needs to save everything that is going to be changed, 
including the indexes.

Also, deleting one row may not actually free up any useful space, as has been 
shown, though if you can delete one row, you could commit that transaction and 
then delete another (though I can’t be sure if there could be a corner case 
where deleting a row might increase the size of the database, maybe some 
trigger fires???)
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Remove row to insert new one on a full database

2019-04-05 Thread Richard Damon
On 4/5/19 11:14 AM, Arthur Blondel wrote:
> I have enough disk space. I just limit the database file size artificially
> for testing purpose as you can see.
> There is no problem of privilege and there is nothing else than the code I
> sent. No other access to the DB.
> I'm using sqlite 3.16.2

As has been pointed out, one issue is that not all records, even if the
'same' take the same space, so deleting one record may not make enough
room for another.

Another issue is that for indexes, not all free space are the same,
indexes keep similar values together in the index, so adding a row may
need to find related space for an index, or you need to delete enough
rows to either open space where needed or to free a full page of the
index to let that page be used in the newly needed space for the index.

I will admit that these are in a way esoteric implementation dependent
details, so might not seem obvious, but they do explain the
'strangeness' that you see. Many data structures when running at a
capacity limit can demonstrate these sorts of strangeness.

Trying to tightly control resource usage is a tricky problem, and
sometimes you need to think carefully about what you goal actually is
(not what solution you think will work). Putting your hard limit on the
base size of the database does put a hard limit on the size (in bytes)
of the database, but may provide an unexpectedly low capacity of records
in the worse case (and running data structures at this sort of limit
tends to create at times conditions close to worse case), at the cost
that the time to insert a record can grow significantly. If you really
have plenty of disk space, than establishing a record limit in the
database, and when you are at it removing one record for every record
added, will smooth out the access time, at the cost of possibly higher
disk usage at times (but maybe a better ratio of size per records).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ANN: SQLite3 Decimal Extension

2019-04-05 Thread Richard Damon
On 4/5/19 9:50 AM, Joshua Wise wrote:
> Julian dates are definitely floating point numbers, not integers.

Julian dates, if being used to represent a time on a given date would be
a floating point number. A Julian date, if only needing to express a
time to the precision of a whole Day, could be represented as an integer
(with some rule to define which of the likely integers you would chose,
due to the natural fuzziness of plain dates).

Thus Apr 5, 2019 could be expressed as an integer (and in fact to
express it as a float requires adding a lot of detail about the date,
verse maybe 1 bit of information)

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Remove row to insert new one on a full database

2019-04-04 Thread Richard Damon
On 4/4/19 11:35 PM, Simon Slavin wrote:
> On 5 Apr 2019, at 4:14am, Richard Damon  wrote:
>
>> I think is logic is to attempt to insert a row, and if rather than
>> inserting it, the call returns the error condition, 'Database Full'
> Okay.  So now we understand what OP meant by the database being full.
>
> SQLITE_FULL does not mean 'Database Full'.  What's full is the volume the 
> database is stored on.  And the problem is worse than it might appear because 
> if the volume is full you can't reliably do anything to the database.  
> Because even if your next SQL command is DELETE, the first thing that'll 
> happen is that SQLite will try to write to the journal file.  And that may 
> fail, because there's no room for the journal file to get bigger.
>
> In other words, if your database ever gets this big, you need a human to come 
> sort things out.
>
> So don't do that.  Don't let your database get that big.  Monitor the free 
> space and start deleting stuff if free space gets below a certain amount.  
> Leave yourself 5Meg of space free or something.
>
> Simon.
Yes, if the error is the SQLite error from the OS saying the disk is
full, then you are in big trouble as you can't expect to be able to do
anything. If perhaps you have some hooks at the file system level that
return that error if this one file hits a certain size and doesn't allow
it to get bigger, but there is still space for the journal file, you
might get the sort of behavior described.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Remove row to insert new one on a full database

2019-04-04 Thread Richard Damon
I think is logic is to attempt to insert a row, and if rather than
inserting it, the call returns the error condition, 'Database Full', you
remove a record and then try again (a form of error recovery), if it
succeeds, then you go on and get more data.

If full was X records, then they would ALWAYS get room to add a new
record after deleting one. Since the reported issue is that it can
require the deletion of possibly very many records to get space to
succeed tells me that 'full' is size based (maybe a max number of
pages). If the record being added is on the larger size of all the 'same
sized' records, then you need to keep deleting records until you get one
that big, or delete two that are consecutive so that you get a large
enough space, or you delete a records that has some spare space next to
it giving enough room.

One issue with this definition is that you can't tell if the database is
currently full except by trying to add the record (or knowing a LOT of
internal details of record storage), which is what the code is doing.

On 4/4/19 9:19 AM, Stephen Chrzanowski wrote:
> This almost sounds like "Full" is a software limitation, in that your
> application is specifying that "Full" means you can only have "X" number of
> rows.
>
> If you're looking to remove data, I'd suggest that you find some way to
> isolate the oldest record, either by a row identifier (Like an ID field
> that's using auto-increment) or a date/time stamp (Assigned by
> current_timestamp).
>
> Also, your logic is backwards in your pseudo-code.  You should check the
> status of the database before you do any kind of insert. The reason is, if
> you insert into an already full database, then you're database is over-full
> at that point.  Also, your pseudo-code has two conditions to look at...  Do
> this forever, and repeat while status is full.  Not to mention, if your
> database is messed up and nothing can be inserted even though the table is
> empty, you've introduced a lockup.
>
> What I think you're looking more for is:
>
> while (dbStatus() == full) {
>   remove_one_row_from_db();
> }
> result=insert_1_row_to_db();
> if (result != resOK) {
>   die("uhh.. Problem with the database?");
> }
>
>
> On Thu, Apr 4, 2019 at 6:53 AM Arthur Blondel 
> wrote:
>
>> Hello
>>
>> When I try to insert new data to a full SQLite database, I need to remove
>> much more than really needed. I'm doing the following:
>>
>> while(1) {
>> do {
>> status = insert_1_row_to_db();
>> if (status == full) {
>> remove_one_row_from_db();
>> }
>> } while (status == full);}
>>
>> The inserted data has always the same size. When the database is full,
>> removing only one row is enough to insert the new one. But after a while, I
>> need to remove 30, 40 and even more the 100 rows to be able to insert one
>> new row. Is it the correct behavior of SQLite? Is there a way to remove
>> only what is needed and no more? Thanks
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Remove row to insert new one on a full database

2019-04-04 Thread Richard Damon
On 4/4/19 2:07 AM, Arthur Blondel wrote:
> Hello
>
> When I try to insert new data to a full SQLite database, I need to remove
> much more than really needed. I'm doing the following:
>
> while(1) {
> do {
> status = insert_1_row_to_db();
> if (status == full) {
> remove_one_row_from_db();
> }
> } while (status == full);}
>
> The inserted data has always the same size. When the database is full,
> removing only one row is enough to insert the new one. But after a while, I
> need to remove 30, 40 and even more the 100 rows to be able to insert one
> new row. Is it the correct behavior of SQLite? Is there a way to remove
> only what is needed and no more? Thanks

As people say, this isn't a good description of the problem, and I
suspect that your statement of 'same size' is a big part of the issue.
Did you realize that the size of a row can be affected by the values
being inserted into it? This means that if you do have an upper limit to
the size of the database, and need to delete some data to make room for
more, if you don't vacuum the database to squeeze out  the holes in the
database, you will need to delete a row that is big enough to store the
new row to have room to store it. After that, you may have room to store
a number of new rows that fit within the gaps you left behind.

Vacuuming a database can be a slow operation, because it basically needs
to copy the whole database into a new copy, squeezing out the gaps as it
goes. It also says you need space on your system for the two copies of
the database, so if that is the critical issue, might not be feasible.
Vacuuming, if practical, is the best way to (after you delete something)
to make room in the database, as it can bring together all the odd holes
from the various pieces of deleted data. 

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Re: UPSERT with multiple constraints

2019-03-28 Thread Richard Damon
I think it can all be done in a single table, with columns, A, B, C,
type, Name, and other things, with indexes/constraints
UNIQUE(A, B, C, Type)
UNIQUE(Name, Type)

and quite possible an addition PRIMARY index, perhaps the default ROWID
one, as neither of those UNIQUE indexes look to be ideal as a primary
key for other tables that might want to refer to an item to use.

On 3/28/19 1:59 AM, Roger Schlueter wrote:
> Following Simons' comment, changing the schema to conform to SQL
> expectations would involve having at least two tables.  Consider your
> second uniqueness criterion; Let's call those items a "Widget" so your
> Widget table would be:
>
> WIDGETS
> {A, B, C, .}  UNIQUE(A,B,C)
>
> Let's call your items whose Name is unique "Gadgets" so your Gadgets
> table would be:
>
> GADGETS
> {Name, A, B, C, }  UNIQUE(Name)
>
> I assume there are other things:
>
> THINGS
> {Type, Name, A, B, C, .}  No(?) uniqueness
>
> Knowing the Type of items to be updated, you know which table to use.
>
> On 3/27/2019 15:59, Thomas Kurz wrote:
>>> Can I ask what it is that you're trying to do ?  This smacks of
>>> trying to add 1 to an existing value or something like that.
>> Sure. I have a table of items. Each item has a type, a name, and
>> properties A, B, C (and some more, but they're not relevant here).
>>
>> I want to enforce that items of a certain type and name are unique:
>> UNIQUE (type, name). But there can be multiple items with the same
>> name as long as they are of different types.
>>
>> Furthermore, items of a certain type that have identical properties
>> A, B, C are also considered equal, regardless of their name: UNIQUE
>> (type, A, B, C).
>>
>> I cannot use UNIQUE (type, name, A, B, C), as this would mean that
>> there can be two items with the same A, B, C (and type, of course),
>> but different name. On the other hand, there could be two items with
>> the same same (and type, of course) but different A, B, C.
>>
>> Now when inserting an item that already exists (according to the
>> uniqueness definition above), the existing item should be updated
>> with the new name and A, B, C properties.
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] DATA RACE on sqlite3GlobalConfig.isInit

2019-03-21 Thread Richard Damon
On 3/21/19 7:43 AM, Плотников Павел wrote:
> Hello,
>
> The race relates to the usage of sqlite3GlobalConfig.isInit variable within 
> the sqlite3_initialize routine.
>
> As for 3.27.2 source code, the lines to be highlighted are
>
> sqlite3.c:153852: if( sqlite3GlobalConfig.isInit ) return SQLITE_OK;
> and
> sqlite3.c:153938: sqlite3GlobalConfig.isInit = 1;
>
> The race is that there are no obstacles for the code from these lines to be 
> executed simultaneously: the if-expression gets evaluated with none of 
> synchronization applied.

That really isn't a data-race in the sense I am used to seeing it. At
least not at that level, there is perhaps one at a higher level using that.

I presume isInit is in a variable that is appropriately sized so that
the fetch and set are atomic operations (if not, then there MIGHT be a
race possible).

Yes, the test might give either a true or false answer depending on the
exact timing of the execution (does the test see the value before or
after the set), but that by itself isn't a race, as the mere fact that
you have parrallel execution says that both answers are allowed unless
something had been done to enforce a particular order. And in
particular, both answers would be possible even if you did add a mutual
exclusion between those two operations alone so they can't occur 'at the
same time', as it just depends on which one happened first.

Now, if the first line after seeing that isInit was false, does an
initialization that would be improper because it had already been done,
then THAT would be a data race, but its scope is bigger than just those
two lines.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Handling ROLLBACK

2019-03-03 Thread Richard Damon
On Mar 3, 2019, at 8:32 AM, Simon Slavin  wrote:
> 
> To summarize, the list feels that this is an incorrect model
> 
>BEGIN;
>... first set of commands
>ROLLBACK;
>... second set of commands
>END;
> 
> whereas this is how things are meant to work:
> 
>BEGIN;
>... first set of commands
>ROLLBACK;
>BEGIN;
>... second set of commands
>END;
> 
> and that since ROLLBACK ends a transaction, it releases locks.  The above is 
> correct for all journalling models except for OFF, where the effect of 
> ROLLBACK is undefined.  (I'm ignoring SAVEPOINTs for now.)
> 
> Simon.
> 

And the way to get the first pattern is to set a SAVEPOINT right after the 
BEGIN and restore back to it instead of using ROLLBACK.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ON CONFLICT with partial indexes

2019-02-18 Thread Richard Damon
> On Feb 18, 2019, at 10:23 AM, Charles Leifer  wrote:
> 
> Thanks for the explanation. So does this mean that two identical queries
> can have completely different query plans depending on whether they use a
> parameters vs literal values embedded in the SQL string?
> 

Remember the query plan is determined when that statement is compiled, which is 
BEFORE you do the binding of the parameters, so the plan can not depend on the 
value of parameters. There is no later attempt to optimize once the values are 
known, as in general, this isn’t apt to help (you have one case where it could, 
but to help that case, you would need to hurt a lot of other more common cases).
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite 3.37.1: void function returns value

2019-02-11 Thread Richard Damon
> On Feb 11, 2019, at 6:33 AM, Peter da Silva  wrote:
> 
> I am pretty sure that the code is not legal C because it's using the return
> value of a void function, as well as returning a value from a void
> function. Compilers that "do what I mean" and accept it are in error. It's
> certainly possible that some obscure clause in some C standard blesses it
> but I can't imagine why they would.

I would need to dig through the relevant Standards to confirm, but my memory 
was that C++ added this feature to help with templates (you might have a 
template where the return value was templates on type, so return  was a logical possibility). C did not need this, so didn’t adopt it 
(though possible some later version did for compatibility).

Many C Compilers are also C++.Compilers and often accept code that uses 
features of one language that aren’t in the other as an extension, generating a 
warning only if a ‘be fussy’ flag is set. (Few compilers are fully conforming 
to the Standard by default).

Even if this was adopted by C in some later Standard, the use of this syntax 
would be a needless requirement for a much later version of the Standard than 
would otherwise be needed.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] GROUP BY and ICU collation

2019-02-08 Thread Richard Damon
On 2/8/19 9:11 AM, David Raymond wrote:
> Remember that fancy collations don't just look at 1 character at a time, they 
> look at the whole thing, and can do surprising stuff based on that. In this 
> case the order of preference for the collation looks like "when it's part of 
> a larger word, then treating 'S' and 'Š' the same is more important than 
> separating them. But when it's just 1 character then they're different.
>
> So 'S' might be before 'Š', but in words it might go
>
> 'Sam'
> 'Šam'
> 'Skunk'
> 'Škunk'
> 'Sudden'
> 'Šudden'
>
> rather than a simple character-at-a-time order of
>
> 'Sam'
> 'Skunk'
> 'Sudden'
> 'Šam'
> 'Škunk'
> 'Šudden'
>
Actually, the way those collations work is that on first pass, S and Š
compare equal, but if two words on first pass compare equal, then
effectively a second pass take place, and on the second pass, S and Š
compare with an order (I thought I remembers there even being some cases
that needed a third pass).

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database locking problems

2019-01-21 Thread Richard Damon
On 1/21/19 4:38 PM, James K. Lowden wrote:
> On Sun, 20 Jan 2019 17:01:25 -0700
> "Keith Medcalf"  wrote:
>
>> SQLite3 however has latencies on the order of microseconds 
> Is that really true?  Are there machines for which SQLite's throughput
> can be measured in transactions per millisecond?  
>
> I think you're referring to the latency of the function-call overhead,
> as opposed to using a network interface.  But since DBMSs are basically
> I/O machines, and the most interesting operations involve I/O, it's not
> clear to me why function-call overhead is a relevant measure.  
>
>> [SQLite] does not have the same opportunity for opportunistic
>> behaviour as does a client/server database which may be serving
>> thousands of concurrent (but different) applications.
> That I think is the relevant measure.  It's the number of clients, not
> latency, that makes SQLite's lock-the-whole database feasible.  On a
> large machine supporting thousands of clients, the latency advantage
> would be overwhelmed by the concurrency disadvantage, depending of
> course on the rate of updates.  
>
> --jkl

SQLite is NOT intended to be used over a network, and in fact there are
warnings about accessing the database file over a network file system,
as this often doesn't support the assumed locking primitives. SQLite is
intended to be accessed over the local file system. Some operations can
be order of microseconds if the data resides in cache, slightly longer
if the database is in flash memory, and perhaps 100s of microseconds to
milliseconds if the database is on spinning rust, and the operation
needs to access the drive to get the needed data.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database locking problems

2019-01-20 Thread Richard Damon
On 1/20/19 6:32 PM, Thomas Kurz wrote:
> Just for curiosity: how do other DBMS (MySQL, etc.) solve this issue? I guess 
> the keypoint is that no matter where the query comes from, the database files 
> are always under control of the same process which then can take care of the 
> correct order in which to read and write data.
>
> But the problem that the same database page cannot be written to from 2 
> different statements (or being read while another statement is writing into 
> it) should arise there as well, shouldn't it? However, I have never seen 
> MySQL return some kind of "busy" (whatsoever it may be named there).
>
> As I said, just for curiosity - no offense against SQlite because I can well 
> understand the problem that SQlite has to deal with :-)
>
If the issue is the dead lock, you get similar issues with all DBMSes.
One thing that does help is that many DBMS make the transaction locking
on a record level, not the full database (at an overhead cost). somewhat
like that experimental concurrency option. Running into deadlocks is a
real issue with MySQL databases.

If the issue is improper sharing between threads, then the fact that
other DBMS run the database server in a separate process handles a lot
of the sharing issues (at the cost of overhead). SQLite, gets rid of a
lot of that overhead at the cost of the application needs to follow a
set of rules.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database locking problems

2019-01-20 Thread Richard Damon
On 1/20/19 4:51 PM, andrew.g...@l3t.com wrote:
> James K. Lowden wrote:
>> On Sat, 19 Jan 2019 08:07:42 -0500 Richard Hipp  wrote:
>>> The busy timeout is not working because you start out your transaction
>>> using a read operation - the first SELECT statement - which gets a read
>>> lock.  Later when you go to COMMIT, this has to elevate to a write
>>> lock.  But SQLite sees that some other process has already updated the
>>> database since you started your read.
>> Another solution is to rely on atomicity in SQL:
>>
>> insert into t
>> select :pid, nrows, N
>> from (select 1 as N union select 2 union select 3) as cardinals
>> cross join (select :pid, count(*) as nrows from t) as how_many;
>>
>> By using a single SQL statement, you avoid a user-defined transaction
>> and any proprietary transaction qualifiers.
> Thank you for the suggestion, but I don't believe this is an option in my 
> application.  There's too much back-and-forth between the database and my 
> logic to put it all into a single statement.  Thus, transactions are 
> necessary.  Transactions exist to allow multiple statements to become an 
> atomic unit, so eschewing them is basically the same thing as admitting they 
> don't work.  There are two possibilities:
>
> 1. Transactions do work, but I'm misusing them and must learn how to be more 
> careful.  In this case, I will update documentation to properly explain their 
> use to others.
>
> 2. Transactions don't work, at least not for my task.  In this case, I will 
> do my best to investigate the problem and suggest a correction.
>
> Either way, the right thing for me to do is continue digging in.
>
> Going from BEGIN to BEGIN IMMEDIATE indeed fixed the test program I wrote, 
> but my application is still broken, and I don't know what's different about 
> it.  I'm working on instrumenting the fcntl() calls to log the sequence of 
> operations.

Transactions work, but the way you have described your use of them has a
predictable issue. The problem being if you begin with a shared lock,
try to promote to an exclusive lock to write, then you need to wait for
all the other shared locks to clear, and if another of them also tries
to promote to an exclusive lock, which cause a deadlock, which sounds to
be your issue.

Starting with a BEGIN IMMEDIATE breaks this deadlock situation, so it
can fix your problem, at the cost that you get less concurrency.

The experimental concurrency option might also help, in that a write
operation doesn't need an exclusive lock for the whole database, but
does introduce more points where a transaction might fail and need to be
rolled back.


-- 
Richard DamonTh

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Conditional lowering of value

2019-01-04 Thread Richard Damon
On 1/4/19 6:30 AM, Cecil Westerhof wrote:
> I have the following query:
> SELECT MIN(totalUsed) - 1
> FROM   quotes
> WHERE  totalUsed <> 'notUsed'
>
> What I want is that when the SELECT returns a positive value every record
> where totalUsed <> 'notUsed' is lowered with the returned value. Is easy to
> accomplish in programming code, but I was wondering if this could be done
> with a SQL statement.
>
Look at the UPDATE query:

UPDATE quotes SET totalUsed = totalUsed - 1 WHERE totalUsed <> 'notUsed';


I will note that this database is obviously using SQLite's variation
from SQL of mixed type columns, as it appears that totalUsed is likely a
column that is normally numeric, but sometimes a string.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] A Minor Issue Report: Extra const Keyword in PragmaName zName

2019-01-02 Thread Richard Damon
On 12/30/18 6:10 PM, Richard Green wrote:
> I believe the struct PragmaName (in pragma.h) has an extra 'const'
> keyword for zName, in Version 3.26.0 (2018-12-01); probably has no
> effect.
>
> Currently,
>
> /* Definitions of all built-in pragmas */
> typedef struct PragmaName {
>   const char *constzName; /* Name of pragma */
>   u8 ePragTyp; /* PragTyp_XXX value */
>   u8 mPragFlg; /* Zero or more PragFlg_XXX values */
>   u8 iPragCName;   /* Start of column names in pragCName[] */
>   u8 nPragCName;   /* Num of col names. 0 means use pragma
> name */
>   u64 iArg;    /* Extra argument */
> } PragmaName;
>
>
> Probably should be,
>
> /* Definitions of all built-in pragmas */
> typedef struct PragmaName {
>   const char *zName;   /* Name of pragma */
>   u8 ePragTyp; /* PragTyp_XXX value */
>   u8 mPragFlg; /* Zero or more PragFlg_XXX values */
>   u8 iPragCName;   /* Start of column names in pragCName[] */
>   u8 nPragCName;   /* Num of col names. 0 means use pragma
> name */
>   u64 iArg;    /* Extra argument */
> } PragmaName;
>
>
>
> Thank you,
>
> Richard Green

const char *constzName;

is defining a member constzName, so changing that would require changing
everywhere it is used.

if it was

const char *const zName; // Note extra space

Then that is declaring that zName is an immutable pointer to a immutable
string/character, which is actually likely true, as the code shouldn't
be changing the names of pragmas.

Your modified definition says that the code is allowed to poke through
zName and change the string representing the pragma.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using sqlite3_interrupt with a timeout

2018-12-31 Thread Richard Damon
On 12/31/18 4:10 PM, Jesse Rittner wrote:
> Simon Slavin-3 wrote
>> You can calculate an "end time" yourself, check it each time you're ready
>> to call sqlite3_step(), and jump straight to sqlite3_finalize() if your
>> time is up.
> I'm not familiar with the inner workings of sqlite3_step, but if that itself
> takes time, then I'd like to interrupt the call prematurely.
>
I don't think the interrupt call will actually terminate a step that is
actually being processed, but only mark that no more steps should
happen. In other words, I don't think SQLite is spending time checking a
flag to stop in the middle of processing a step to allow the processing
to terminate early without a result.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


  1   2   >