[GitHub] [jena] ieugen opened a new issue, #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

GitBox Wed, 31 Aug 2022 07:13:29 -0700


ieugen opened a new issue, #1500:
URL: https://github.com/apache/jena/issues/1500


   ### Version
   
   4.6.0
   
   ### Feature
   
   This was asked on ML 
https://lists.apache.org/thread/rdt5otow263xhvwymfsgnxwwy2bxh60r . 
   
   > We are using fuseki and we would like to implement a backup policy similar 
in capabilities to what [autopostgresqlbackup] has to offer.
   Are there any existing solutions out there that can do all / part of these?
   > We would like to take:
   > * daily backups for a week
   > * weekly backups - 1 per week, last 4 weeks
   > * monthly backups - 1/ month, last 6 months
   I believe this could be scripted with via the HTTP API + directory access.
   The backup api in [fuseki-server-protocol] can trigger a backup and can also 
list existing backups.
   Unfortunately in the current implementation, backup is not consistent.
   > In case of a server crash during backup, the file will remain there 
incomplete.
   
   > Also, since tasks are stored in memory and cleaned (periodically / on 
restart) there is no way to know for sure if the backup was successful or not.
   In have encountered the above quite often in some workloads.
   
   > The in-consistency could be solved by writing the backup to temporary file 
name and on completion, renaming it to final file name.
   Rename is usually atomic operation on POSIX file systems.
   
   > /backup-list API can list all backups or split backups in complete / 
incomplete. IMO for now, it can list all of them.
   
   > The in progress backup could be stored alongside the other backups with a 
file marker like: dataset_date.nq.gz.INCOMPLETE .
   Once it's done it can be renamed to dataset_date.nq.gz .
   
   > Cleanup might be handled externally. In case of a crash, the file will 
remain INCOMPLETE until it is removed by system by checking a specific amount 
of time has passed since backup was started (1-2 days). 
   
   @afs replied:
   
   > Yes, the backup should be written then atomically moved (i.e. same 
directory). Cleanup would then be "delete" by pattern in the server startup 
script.
   > As to putting a process script around the functionality, it is an external 
script which needs access to the server file area (to know the state of 
backups). The file system state is the definitive state - not the jobs (that's 
a UI feature).
   
   > This would make a good independent project or contribution. Or published 
example as a starting point because the requirements will be depend on the 
deployment environment and it seems unlikely to me that there is a one size 
fits all. 
   
   > (The codebase already has some "safe write" code in IOX.safeWrite) 
   
   
   ### Are you interested in contributing a solution yourself?
   
   Perhaps?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [jena] ieugen opened a new issue, #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Reply via email to