Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?

Eugen Stan Wed, 31 Aug 2022 07:15:35 -0700

Done https://github.com/apache/jena/issues/1500 .


Thanks.

Will see if I have time to contribute a solution.
But I am busy in the next week or so.
If anyone is interested in providing a fix, please let me know.

Regards,
Eugen

On 30.08.2022 17:13, Andy Seaborne wrote:

On 30/08/2022 12:17, Eugen Stan wrote:
Hi Andy,

Thanks for the feedback.
I think we are in agreement.
Nice touch with cleanup on server startup :).

Should I raise a JIRA issue for the server side bits?
Yes please, or a github issue (we use both)

https://github.com/apache/jena/issues

(The codebase already has some "safe write" code in IOX.safeWrite)

     Andy
I will setup the backup script as separate git repo.

Thanks,
Eugen

On 30.08.2022 13:02, Andy Seaborne wrote:
Hi Eugen,
Yes, the backup should be written then atomically moved (i.e. samedirectory). Cleanup would then be "delete" by pattern in the serverstartup script.
As to putting a process script around the functionality, it is anexternal script which needs access to the server file area (to knowthe state of backups). The file system state is the definitive state- not the jobs (that's a UI feature).
This would make a good independent project or contribution. Orpublished example as a starting point because the requirements willbe depend on the deployment environment and it seems unlikely to methat there is a one size fits all.
Fuseki should make sure it has the right behaviours (like atomic write).

     Andy

autopostgresqlbackup itself is GPL.

On 29/08/2022 11:20, Eugen Stan wrote:
Hello,
We are using fuseki and we would like to implement a backup policysimilar in capabilities to what [autopostgresqlbackup] has to offer.
Are there any existing solutions out there that can do all / part ofthese?
We would like to take:
* daily backups for a week
* weekly backups - 1 per week, last 4 weeks
* monthly backups - 1/ month, last 6 months
I believe this could be scripted with via the HTTP API + directoryaccess.
The backup api in [fuseki-server-protocol] can trigger a backup andcan also list existing backups.
Unfortunately in the current implementation, backup is not consistent.
In case of a server crash during backup, the file will remain thereincomplete.Also, since tasks are stored in memory and cleaned (periodically /on restart) there is no way to know for sure if the backup wassuccessful or not.
In have encountered the above quite often in some workloads.
The in-consistency could be solved by writing the backup totemporary file name and on completion, renaming it to final file name.
Rename is usually atomic operation on POSIX file systems.
/backup-list API can list all backups or split backups in complete /incomplete. IMO for now, it can list all of them.
The in progress backup could be stored alongside the other backupswith a file marker like: dataset_date.nq.gz.INCOMPLETE .
Once it's done it can be renamed to dataset_date.nq.gz .
Cleanup might be handled externally. In case of a crash, the filewill remain INCOMPLETE until it is removed by system by checking aspecific amount of time has passed since backup was started (1-2 days).
WDYT?


[autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup
[fuseki-server-protocol]https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html
Thanks,
z


--
Eugen Stan

+40770 941 271  / https://www.netdava.com

begin:vcard
fn:Eugen Stan
n:Stan;Eugen
email;internet:eugen.s...@netdava.com
tel;cell:+40720898747
x-mozilla-html:FALSE
url:https://www.netdava.com
version:2.1
end:vcard

Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?

Reply via email to