Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-10 Thread Nick

On 10 Dec 2014, at 07:35, Dan Kennedy wrote:

> Strictly speaking the database file may not be well-formed even if there is 
> no ongoing checkpoint. If:
> 
>  a) process A opens a read transaction,
>  b) process B opens and commits a write transaction to the database,
>  c) process C checkpoints the db,
> 
> then the db file considered without the *-wal file may be corrupt. The 
> problem comes about because process C can only checkpoint frames up until the 
> start of B's transaction. And there is an optimization that will prevent it 
> from copying any earlier frames for which there exists a frame in B's 
> transaction that corresponds to the same database page. So it effectively 
> copis only a subset of the modifications made by earlier transactions into 
> the db file - not necessarily creating a valid db file.

Can this corruption be detected by running PRAGMA quick_check / 
integrity_check? Having the occasional backup db corrupted would be tolerable.

Regards
Nick 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-10 Thread Nick

On 10 Dec 2014, at 02:36, Simon Slavin wrote:

> 
> On 10 Dec 2014, at 12:30am, Nick  wrote:
> 
>> That's interesting Simon I didn't expect the database not to be trustworthy.
> 
> The database will be trustworthy at any instant.  Your copy of it will be 
> corrupt because the file will be changing while you are copying it.
> 
>> In WAL mode I thought the database file is only written to when 
>> checkpointing. Have I misunderstood this journaling mode?
> 
> How do you intend to prevent your other processes from checkpointing while 
> you take the backup ?  You can disable checkpointing for your own connection 
> to the database but not for the connections other processes have.

All the processes would have automatic checkpointing disabled. Just the backup 
process would perform the checkpoint.

>> Again I may have misunderstood the docs around the Backup API, does it not 
>> start again from the beginning copying pages if another process writes to 
>> the database during the process? In practice could it successfully backup a 
>> 2GB database that is being written to once a second?
> 
> Not if the writing never stopped.  But there's no way to take a copy of a 
> file which is constantly being rewritten.  rsync can't do it either.  How can 
> anything copy a file which is constantly being modified ?
> 
> You can BEGIN EXCLUSIVE and then END once your backup is finished.  That 
> should prevent other processes writing to the file.  You will have to deal 
> with what happens if your BEGIN EXCLUSIVE times out, and you will have to put 
> long timeouts in your other processes so they can handle the file being 
> locked long enough for the entire copy to be taken.  That's the only way I 
> can think of to do it.  And yes, it will prevent writing to the database 
> while it's being copied.
> 
> On the other hand, there's a different way to clone a database: log the 
> changes.
> 
> When something issues an INSERT/DELETE/UPDATE command, execute the command 
> but also append a copy of that command to a text file somewhere.  When you 
> want to bring your backup copy up-to-date, take a copy of the log file, then 
> execute all the commands in it to your out-of-date copy.
> 
> You need a method of zeroing out the log file, or knowing where you got to on 
> your last backup.

Thanks for the info, Simon.

Regards
Nick

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlar

2014-12-10 Thread Scott Robison
On Wed, Dec 10, 2014 at 3:07 PM, Richard Hipp  wrote:

> On Wed, Dec 10, 2014 at 4:58 PM, Scott Robison 
> wrote:
>
> > In the function add_file it checks if a filename is a directory. If so,
> it
> > calls opendir, and if successful, loops on readdir. But it never calls
> > closedir.
>

I haven't downloaded the update, but that is the exact same fix I applied
to confirm the problem on my side. Two thumbs up! Thanks.

-- 
Scott Robison
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlar

2014-12-10 Thread Richard Hipp
On Wed, Dec 10, 2014 at 4:58 PM, Scott Robison 
wrote:

> I realize that sqlar is not intended as a production quality tool. That
> being said, I was doing some experiments today and encountered a bug (at
> least on FreeBSD, not sure if Linux is impacted).
>
> In the function add_file it checks if a filename is a directory. If so, it
> calls opendir, and if successful, loops on readdir. But it never calls
> closedir.
>
> In my scenario, it would fail after processing 198 files. I'm sure that
> number is dependent on how many directories, and I never thought to track
> that, and I'm sure it's probably platform defined anyway.
>
>
So does the latest commit fix the problem?

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlar

2014-12-10 Thread Scott Robison
I realize that sqlar is not intended as a production quality tool. That
being said, I was doing some experiments today and encountered a bug (at
least on FreeBSD, not sure if Linux is impacted).

In the function add_file it checks if a filename is a directory. If so, it
calls opendir, and if successful, loops on readdir. But it never calls
closedir.

In my scenario, it would fail after processing 198 files. I'm sure that
number is dependent on how many directories, and I never thought to track
that, and I'm sure it's probably platform defined anyway.

Just an FYI.

-- 
Scott Robison
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] News typos

2014-12-10 Thread Kyle Shannon
I just noticed that the version links for the last two releases at:

http://sqlite.org/news.html

seem to be cut/paste errors, 3.8.7.3 and 3.8.7.4 both have links to
the 3.8.7.2 release.

-- 
Kyle
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] replace many rows with one

2014-12-10 Thread RSmith


On 2014/12/10 13:39, Simon Slavin wrote:

Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2


I think the one you are looking for is:

INSERT INTO s2merged SELECT a, b, sum(theCount) FROM s2 GROUP BY a,b;

Not sure if your theCount field already contains totals or if it just has 1's...  how did duplication happen? Should this be the 
case you might also be able to use simply:


INSERT INTO s2merged SELECT a, b, count() FROM s2 GROUP BY a,b;

Either way, the last query will obviously show the duplication counts (if 
needed as an exercise).

For 300 mil rows this will be rather quick if it's going to be a once-off thing and not something running often. I'd say it will 
take under an hour depending on hardware and how much duplication happened in s2.  Making an index will take a lot longer, you are 
better off just running the merge as above - unless of course the eventual use of s2merged includes being a look-up attached DB or 
such, in which case making an index from the start will be worthwhile.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] replace many rows with one

2014-12-10 Thread Hick Gunter
Both, I guess

Insert into ... select a,b,sum(theCount) group by a,b;

-Ursprüngliche Nachricht-
Von: Simon Slavin [mailto:slav...@bigfraud.org]
Gesendet: Mittwoch, 10. Dezember 2014 12:39
An: General Discussion of SQLite Database
Betreff: [sqlite] replace many rows with one

Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2

and there'll be a TOTAL() in there somewhere.  Or is it GROUP BY ?  I can't 
seem to get the right phrasing.

Also, given that this is the last operation I'll be doing on table s2, will it 
speed things up to create an index on s2 (a,b), or will the SELECT just spend 
the same time making its own temporary index ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] replace many rows with one

2014-12-10 Thread Martin Engelschalk

Hi Simon,

Am 10.12.2014 12:39, schrieb Simon Slavin:

Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2
insert into s2merged (a, b, theCount) select a, b, sum(theCount) from s2 
group by a, b;


and there'll be a TOTAL() in there somewhere.  Or is it GROUP BY ?  I can't 
seem to get the right phrasing.

Also, given that this is the last operation I'll be doing on table s2, will it 
speed things up to create an index on s2 (a,b), or will the SELECT just spend 
the same time making its own temporary index ?
Creating the index and select with index will probably be slower than 
select without index


Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] replace many rows with one

2014-12-10 Thread Simon Slavin
Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2

and there'll be a TOTAL() in there somewhere.  Or is it GROUP BY ?  I can't 
seem to get the right phrasing.

Also, given that this is the last operation I'll be doing on table s2, will it 
speed things up to create an index on s2 (a,b), or will the SELECT just spend 
the same time making its own temporary index ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bugreport - slowdown in sqlite after the ANALYZE statement

2014-12-10 Thread Jan Staněk
Thank you very much for the explanation and tips, they are appreciated.

Dne 9.12.2014 v 14:30 Richard Hipp napsal(a):
> Answered by adding a comment at
> https://bugzilla.redhat.com/show_bug.cgi?id=1161844
> 
> On Tue, Dec 9, 2014 at 6:06 AM, Jan Staněk  wrote:
> 
> Hi,
> some of the banshee users noticed a huge slowdown in its operation
> after upgrading to version 3.8.7 from 3.8.6. Here is the related log :
> 
> [4 Debug 13:24:27.263] Executed in 12819ms
> DELETE FROM CoreCache WHERE ModelID = 9;
> INSERT INTO CoreCache (ModelID, ItemID) SELECT
> 9, CoreTracks.TrackID
> FROM (SELECT MIN(CoreTracks.TrackID) AS TrackID,
> CoreTracks.Year FROM CoreTracks GROUP BY CoreTracks.Year) AS CoreTracks
> WHERE CoreTracks.Year IN
> (SELECT CoreTracks.Year FROM CoreTracks, CoreCache
> WHERE CoreCache.ModelID = 371 AND
>   CoreCache.ItemID = CoreTracks.TrackID )
> ORDER BY Year
> 
> Reverting to 3.8.6, gives back a fast answer :
> 
> [4 Debug 13:21:05.433] Executed in 24ms
> DELETE FROM CoreCache WHERE ModelID = 9;
> INSERT INTO CoreCache (ModelID, ItemID) SELECT
> 9, CoreTracks.TrackID
> FROM (SELECT MIN(CoreTracks.TrackID) AS TrackID,
> CoreTracks.Year FROM CoreTracks GROUP BY CoreTracks.Year) AS CoreTracks
> WHERE CoreTracks.Year IN
> (SELECT CoreTracks.Year FROM CoreTracks, CoreCache
> WHERE CoreCache.ModelID = 371 AND
>   CoreCache.ItemID = CoreTracks.TrackID )
> ORDER BY Year
> 
> The original bug reporter then went on and possibly isolated the bug.
> Details are at https://bugzilla.redhat.com/show_bug.cgi?id=1161844 .
> 
> Thanks for your work,
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
-- 
Jan Stanek - Red Hat Associate Developer Engineer - Databases Team
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking advice

2014-12-10 Thread Paul
Hello, Rene

> Hi there,
> 
> I have to store and retrieve up to 2000 parameters.
> The parameters can have real and integer values.
> The max. change rate is 100ms and the max. duration is up to some hours.
> 
> The simple solution would be to use plain binary files. It's fast but not 
> flexible.
> 
> So I came to sqlite. Before starting I would like to ask if someone could 
> give 
> me some advice which way to go. Here are my current versions.
> 
> Version 1: One big table with time stamp and one column for each parameter
> - I can have up to SQLITE_MAX_COLUMNS=2000 but how can I insert all values 
> with only SQLITE_MAX_VARIABLE_NUMBER=999?
> - What about space consumption if NULL values are given?

You can always recompile SQLite having desired parameters.
Or you can make you own implementation of virtual table.
In my opinion, having virtual table would be the most efficient 
way to inflate data into SQLite.
Example:

INSERT INTO my_table(timestamp, param_1, ..., param_n) 
SELECT NOW(), _1, _2, ..., _n FROM my_virtual_table;

Though implementing virtual table correctly will take you some time
that you would spend elswhere.
The rule of thumb: do not bother with virtual table if CPU is not a bottleneck.

Use this approach if you always need to query all of the parameters.

> 
> Version 2: One table for each parameter with time stamp and value
> - Does this scale for up to 2000 parameters?
> - Can sqlite handle so much tables?
> 

First of all, it depends on your usage. If at any given point of
time you may need only one or few parameters and don't care about
the others, you should group then in separate tables or even make a personal
table for each of them. In this case, querying individual parameters or 
groups of parameters will be faster, since SQLite will not pick up 
extra data from the disk. But you must realise that each table will have 
its own copy of timestamp and potentially a ROWID (for non-WITHOUT ROWID tables)
that will eat up extra space. Also inserts/updates will be much more expensive
because data in the database will be fragmented.
So for example if to insert one row into 2000-column table if would take to
write a couple of pages to disk, inserting into 2000 tables would make 
at least 2000 pages to be written to disk.

Consider the fact that in this case it would be the nightmare to 
make sure every timestamp value is associated with a value of every parameter.
(If you have to provide this guarantee)

Yes, SQLite can handle this many tables.

> Version 3: One table with time stamp, parameter id and parameter value
> - Is it working when all values change?
> - Is retrieving values for one parameter fast?

For me, this looks as the most efficient way of implementing what you want.

Retrieving speed will depend on indexes you have.
If you when you are querying parameters you always specific timestamp
or time range, you can get away with compound index (timestamp, parameter_id).
Then selects like:

SELECT value FROM table WHERE timestamp = X;
SELECT value FROM table WHERE timestamp = X AND ...;
SELECT value FROM table WHERE timestamp > X;
SELECT value FROM table WHERE timestamp < X;
SELECT value FROM table WHERE timestamp > X; AND timestamp < y;
SELECT value FROM table WHERE timestamp IN (X1, ..., Xn);


will always be efficient.

Do not create separate index like this, make it PROMARY KEY.
And experement how using WITHOUT ROWID will perform.


> 
> I will write and read the data on the same time. But writing should have 
> priority. I could live with a potential lost of data because of a power 
> failure. So I will use try the settings of the options "synchronous" and 
> "journal_mode". Anything else I should consider?
> 

You may also play around with temp_store and cache_size pragmas.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users