[sqlite] Incremental backup/sync facility?

2016-05-09 Thread Wade, William
> From: Stephan Buchert [mailto:stephanb007 at gmail.com]
> Sent: Saturday, May 07, 2016 12:10 AM

> Copying the WAL files is probably more efficient than the SQL text solutions 
> (considering that roughly 5 GB of binary data are weekly added), and it seems 
> easy to implement, so I'll probably try this first. I guess that simply 
> opening the primary database for read before starting the insert/update 
> process would prevent WAL checkpointing until I have a chance to copy the WAL 
> (and SHM) files?

It sounds like that would work. You would need to be careful with your process. 
At a minimum, you need to be sure that full checkpoints don't occur, except in 
conjunction with your copies. It would probably be very easy to do an 
"accidental" checkpoint simply running the sqlite3 command-line tool after a 
reboot. I believe the no-auto-checkpoint setting is per-application, not 
per-database.

I suppose you could change the checkpoint-code so that after a full checkpoint, 
the WAL file is renamed (with a counter suffix), rather than deleted. You 
incremental backup would just need to copy (and perhaps delete) all of the 
renamed WAL files, and process them, in order. That would have the advantage 
that you don't need to let the WAL file grow so large, and you don't have to do 
anything to the main data base while you are doing the incremental backup. 
You'd still have to be careful with your process. Run some sqlite program that 
doesn't have your "changed" checkpoint-code, and you might lose a WAL file.

Regards,
Bill


**
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**


[sqlite] Incremental backup/sync facility?

2016-05-07 Thread Stephan Buchert

Thanks for the suggestions.

The session extension would work, all tables have rowids. The added
flexibility to insert/update into the database copies first
independently of each other is valuable.

Tracing the inserts/updates seems also a good idea, available with the
present Sqlite version.

Copying the WAL files is probably more efficient than the SQL text
solutions (considering that roughly 5 GB of binary data are weekly
added), and it seems easy to implement, so I'll probably try this
first. I guess that simply opening the primary database for read before
starting the insert/update process would prevent WAL checkpointing
until I have a chance to copy the WAL (and SHM) files?

Splitting this kind of scientific databases into smaller files is in my
opinion generally not a good idea: typical use cases are that
certain patterns/structures in the data are searched for, or some
statistical analysis for longer periods is done. Then several database
files, eg. containing separate time periods, are a nuisance. As an
example, imagine the extra programming work for running a
simple moving average, FFT etc, properly across such artifical time
boundaries between database files.

Sqlite is amazingly fast for this amount of data. Thanks to the
log2(size) scaling of selects (on indexed columns) it should not
significantly slow down as the database size will still increase.

/Stephan


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Hick Gunter
No,you just have to log the bound parameters and a reference to the prepared 
statement (so the other side will know which statement to prepare).
Or just log the statement & the parameters each time.

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Jim Morris
Sent: Freitag, 06. Mai 2016 20:14
To: sqlite-users at mailinglists.sqlite.org
Subject: Re: [sqlite] Incremental backup/sync facility?

Doesn't this eliminate the use of prepared statements?


On 5/6/2016 11:10 AM, Jeffrey Mattox wrote:
> As an aside, this is how Apple syncs Core Data to iCloud (and then to 
> multiple iOS devices) if the backing store uses SQLite (the default).  When a 
> small amount of data changes (which is common), the changes get send out, not 
> the entire (mostly unchanged and potential huge) database.
>
> Jeff
>
>
>> On May 6, 2016, at 7:43 AM, Simon Slavin  wrote:
>>
>> Believe it or not, the fastest way to synchronise the databases is not to 
>> synchronise the databases.  Instead you keep a log of the instructions used 
>> to modify the database.  You might, for example, modify the library that you 
>> use for INSERT, DELETE and UPDATE commands to execute those commands and 
>> also save the command to another 'commandLog' table
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Jean-Luc Hainaut
Le 14:43 06/05/2016,Simon Slavin ?crit:

>On 6 May 2016, at 1:32pm, Stephan Buchert  wrote:
>
>> The largest database file has now grown to about 180 GB. I need to have
>> copies of the files at at least two different places. The databases are
>> updated regularly as new data from the satellites become available.
>> 
>> Having the copies of the file synced becomes increasingly tedious
>> as their sizes increase. Ideal would be some kind of
>> incremental backup/sync facility.
>
>Believe it or not, the fastest way to synchronise the databases is not to 
>synchronise the databases.  Instead you keep a log of the instructions used to 
>modify the database.  You might, for example, modify the library that you use 
>for INSERT, DELETE and UPDATE commands to execute those commands and also save 
>the command to another 'commandLog' table.  Or perhaps just append those 
>commands to a plain text file.
>
>Then instead of sending any data to the other sites you send this list of 
>commands to the other sites and have them execute them.
>
>Once you start implementing this you'll see that it's more complicated than I 
>have described but the text of your post suggests that you're a good enough 
>programmer to do it properly.

This does not need to be so complicated: the problem can be solved by three 
triggers (per table) that insert in a log table the components of the three 
data modification commands. At definite time points, the contents of this table 
is used to generate the SQL data modification commands for the other DB.

J-L Hainaut




[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Simon Slavin

On 6 May 2016, at 3:40pm, Gerry Snyder  wrote:

> One feature of SQLite -- the whole database in one file -- is normally an 
> advantage but becomes less so when the file is huge.

Believe me.  It's still a huge advantage.  Have you ever tried to copy a MySQL 
database off a non-working server by figuring out which files contained it ?  
That's three days of my life I wish I could forget.

Simon.


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Stephan Buchert

We are using Sqlite for data from satellite Earth observations. It
works very well. Thanks to everybody contributing to Sqlite, uppermost
Dr. Hipp.

The largest database file has now grown to about 180 GB. I need to have
copies of the files at at least two different places. The databases are
updated regularly as new data from the satellites become available.

Having the copies of the file synced becomes increasingly tedious
as their sizes increase. Ideal would be some kind of
incremental backup/sync facility.

I looked at https://www.sqlite.org/backup.html, but, as it does
a complete backup, this takes a long time for such large
databases (files are mounted over networks), and other work gets
blocked.

An alternative is perhaps https://www.sqlite.org/sqldiff.html, but it
is also slow for large databases. Most of the data are binary, and the
SQL input/output of sqldiff/sqlite3 is probably not very efficient.

A solution might be to add an extension to the standard VFS
http://www.sqlite.org/vfs.html. Each page would need a counter which
increments when the page is modified (and there seems to
be spare place for such counters). Then the backup could test which
pages need to be updated. Is there any chance that such an extension
could be added?

A kind of hack-ish solution might be to update the primary database
files in WAL mode, copy only the WAL file to the secondary place,
and force there WAL checkpoint. Would this work?

/Stephan


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Richard Hipp
On 5/6/16, Hick Gunter  wrote:
> No,you just have to log the bound parameters and a reference to the prepared
> statement (so the other side will know which statement to prepare).
> Or just log the statement & the parameters each time.

The sqlite3_trace() interface fills in the values for the parameters
automatically.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Simon Slavin

On 6 May 2016, at 1:32pm, Stephan Buchert  wrote:

> The largest database file has now grown to about 180 GB. I need to have
> copies of the files at at least two different places. The databases are
> updated regularly as new data from the satellites become available.
> 
> Having the copies of the file synced becomes increasingly tedious
> as their sizes increase. Ideal would be some kind of
> incremental backup/sync facility.

Believe it or not, the fastest way to synchronise the databases is not to 
synchronise the databases.  Instead you keep a log of the instructions used to 
modify the database.  You might, for example, modify the library that you use 
for INSERT, DELETE and UPDATE commands to execute those commands and also save 
the command to another 'commandLog' table.  Or perhaps just append those 
commands to a plain text file.

Then instead of sending any data to the other sites you send this list of 
commands to the other sites and have them execute them.

Once you start implementing this you'll see that it's more complicated than I 
have described but the text of your post suggests that you're a good enough 
programmer to do it properly.

This assumes that the structure and primary keys of the tables which hold data 
are constructed in such a way that the order in which new data is entered 
doesn't matter.

Simon.


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Jeffrey Mattox
As an aside, this is how Apple syncs Core Data to iCloud (and then to multiple 
iOS devices) if the backing store uses SQLite (the default).  When a small 
amount of data changes (which is common), the changes get send out, not the 
entire (mostly unchanged and potential huge) database.

Jeff


> On May 6, 2016, at 7:43 AM, Simon Slavin  wrote:
> 
> Believe it or not, the fastest way to synchronise the databases is not to 
> synchronise the databases.  Instead you keep a log of the instructions used 
> to modify the database.  You might, for example, modify the library that you 
> use for INSERT, DELETE and UPDATE commands to execute those commands and also 
> save the command to another 'commandLog' table


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Hick Gunter
May I suggest using a statement log of the "primary" database to update the 
secondary sites? If you have a sequence counter (separate or embedded) then the 
state of the database could be recovered from a backup (sequence counter x) 
plus the log files for all statements from x+1 to "current".

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Stephan 
Buchert
Sent: Freitag, 06. Mai 2016 14:32
To: sqlite-users at mailinglists.sqlite.org
Subject: [sqlite] Incremental backup/sync facility?


We are using Sqlite for data from satellite Earth observations. It works very 
well. Thanks to everybody contributing to Sqlite, uppermost Dr. Hipp.

The largest database file has now grown to about 180 GB. I need to have copies 
of the files at at least two different places. The databases are updated 
regularly as new data from the satellites become available.

Having the copies of the file synced becomes increasingly tedious as their 
sizes increase. Ideal would be some kind of incremental backup/sync facility.

I looked at https://www.sqlite.org/backup.html, but, as it does a complete 
backup, this takes a long time for such large databases (files are mounted over 
networks), and other work gets blocked.

An alternative is perhaps https://www.sqlite.org/sqldiff.html, but it is also 
slow for large databases. Most of the data are binary, and the SQL input/output 
of sqldiff/sqlite3 is probably not very efficient.

A solution might be to add an extension to the standard VFS 
http://www.sqlite.org/vfs.html. Each page would need a counter which increments 
when the page is modified (and there seems to be spare place for such 
counters). Then the backup could test which pages need to be updated. Is there 
any chance that such an extension could be added?

A kind of hack-ish solution might be to update the primary database files in 
WAL mode, copy only the WAL file to the secondary place, and force there WAL 
checkpoint. Would this work?

/Stephan
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Jim Morris
Doesn't this eliminate the use of prepared statements?


On 5/6/2016 11:10 AM, Jeffrey Mattox wrote:
> As an aside, this is how Apple syncs Core Data to iCloud (and then to 
> multiple iOS devices) if the backing store uses SQLite (the default).  When a 
> small amount of data changes (which is common), the changes get send out, not 
> the entire (mostly unchanged and potential huge) database.
>
> Jeff
>
>
>> On May 6, 2016, at 7:43 AM, Simon Slavin  wrote:
>>
>> Believe it or not, the fastest way to synchronise the databases is not to 
>> synchronise the databases.  Instead you keep a log of the instructions used 
>> to modify the database.  You might, for example, modify the library that you 
>> use for INSERT, DELETE and UPDATE commands to execute those commands and 
>> also save the command to another 'commandLog' table
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Stephen Chrzanowski
Gerry;

I trashed the email I was going to send.  You had the same line of thought
as me in regards to chopping the file on a per-day basis, but, what made me
trash it was any auto-numbered PKs that would be a hassle in new files,
unless that information was put into the new DB upon creation.

I agree that when files get large, a revamp of how the data needs to be
stored has to be examined when the new consideration of how to backup the
sucker becomes more of a logistical nightmare.  With MySQL, data
replication like that is somewhat a breeze.  With SQLite, the convenience
and portability applications granted by SQLite die when files become too
large to deal with for backup purposes.

On Fri, May 6, 2016 at 10:40 AM, Gerry Snyder 
wrote:

On 5/6/2016 5:32 AM, Stephan Buchert wrote:
>
>> We are using Sqlite for data from satellite Earth observations. It
>> works very well. Thanks to everybody contributing to Sqlite, uppermost
>> Dr. Hipp.
>>
>> The largest database file has now grown to about 180 GB.
>>
>
> One feature of SQLite -- the whole database in one file -- is normally an
> advantage but becomes less so when the file is huge.
>
>
>   I need to have
>> copies of the files at at least two different places. The databases are
>> updated regularly as new data from the satellites become available.
>>
>
> Others have suggested keeping track of changes using a logging file. That
> allows all the SELECT statements to remain unchanged, while complicating
> the input side of things.
>
> Another approach could be to have new observations go into a separate file
> (one day's worth, or whatever makes sense). This much smaller file could be
> shipped to the other site(s) and then merged into each copy of the main
> database. Almost no changes to the input logic, but every SELECT would have
> to use a JOIN. No idea how painful  process that would be.
> Gerry
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Roger Binns
On 06/05/16 05:32, Stephan Buchert wrote:
> Having the copies of the file synced becomes increasingly tedious
> as their sizes increase. Ideal would be some kind of
> incremental backup/sync facility.

Out of curiousity, would an approach of using multiple databases and
using ATTACH to "unify" them work?  The individual databases could
represent shorter amounts of time (eg a week) and as you state smaller
files are easier to deal with.

Roger


-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: 



[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Richard Hipp
On 5/6/16, Richard Hipp  wrote:
> On 5/6/16, Simon Slavin  wrote:
>>
>> Believe it or not, the fastest way to synchronise the databases is not to
>> synchronise the databases.  Instead you keep a log of the instructions
>> used
>> to modify the database.
>
> Or, this might be an even better solution.  Note that the
> sqlite3_trace() function (see
> https://www.sqlite.org/c3ref/profile.html) can help here.
>
> A third solution would be to use the session extension, which will be
> merged to trunk on the next release.  See
> https://www.sqlite.org/c3ref/profile.html

I bungled the copy/paste of that URL.  Should be
https://www.sqlite.org/draft/sessionintro.html

> for details.  One advantage
> of using the session extension is that it will work even if the two
> database files are not identical at the onset.  A disadvantage is that
> the session extension only works with rowid tables, not WITHOUT ROWID
> tables.
>
> --
> D. Richard Hipp
> drh at sqlite.org
>


-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Richard Hipp
On 5/6/16, Simon Slavin  wrote:
>
> Believe it or not, the fastest way to synchronise the databases is not to
> synchronise the databases.  Instead you keep a log of the instructions used
> to modify the database.

Or, this might be an even better solution.  Note that the
sqlite3_trace() function (see
https://www.sqlite.org/c3ref/profile.html) can help here.

A third solution would be to use the session extension, which will be
merged to trunk on the next release.  See
https://www.sqlite.org/c3ref/profile.html for details.  One advantage
of using the session extension is that it will work even if the two
database files are not identical at the onset.  A disadvantage is that
the session extension only works with rowid tables, not WITHOUT ROWID
tables.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Richard Hipp
On 5/6/16, Stephan Buchert  wrote:
>
> A kind of hack-ish solution might be to update the primary database
> files in WAL mode, copy only the WAL file to the secondary place,
> and force there WAL checkpoint. Would this work?
>

This sounds like the most promising solution to me.  We'll think on it some
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Incremental backup/sync facility?

2016-05-06 Thread Gerry Snyder
On 5/6/2016 5:32 AM, Stephan Buchert wrote:
> We are using Sqlite for data from satellite Earth observations. It
> works very well. Thanks to everybody contributing to Sqlite, uppermost
> Dr. Hipp.
>
> The largest database file has now grown to about 180 GB.

One feature of SQLite -- the whole database in one file -- is normally 
an advantage but becomes less so when the file is huge.


>   I need to have
> copies of the files at at least two different places. The databases are
> updated regularly as new data from the satellites become available.

Others have suggested keeping track of changes using a logging file. 
That allows all the SELECT statements to remain unchanged, while 
complicating the input side of things.

Another approach could be to have new observations go into a separate 
file (one day's worth, or whatever makes sense). This much smaller file 
could be shipped to the other site(s) and then merged into each copy of 
the main database. Almost no changes to the input logic, but every 
SELECT would have to use a JOIN. No idea how painful  process that would 
be.
>
> Having the copies of the file synced becomes increasingly tedious
> as their sizes increase. Ideal would be some kind of
> incremental backup/sync facility.
>
> I looked at https://www.sqlite.org/backup.html, but, as it does
> a complete backup, this takes a long time for such large
> databases (files are mounted over networks), and other work gets
> blocked.
>
> An alternative is perhaps https://www.sqlite.org/sqldiff.html, but it
> is also slow for large databases. Most of the data are binary, and the
> SQL input/output of sqldiff/sqlite3 is probably not very efficient.
>
> A solution might be to add an extension to the standard VFS
> http://www.sqlite.org/vfs.html. Each page would need a counter which
> increments when the page is modified (and there seems to
> be spare place for such counters). Then the backup could test which
> pages need to be updated. Is there any chance that such an extension
> could be added?
>
> A kind of hack-ish solution might be to update the primary database
> files in WAL mode, copy only the WAL file to the secondary place,
> and force there WAL checkpoint. Would this work?
>
> /Stephan
>
Gerry