Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?
Hello, So I took some time to implement a program to do backups following a policy. To implement such a program I think it would be helpful to add the database being backed up to the tasks JSON output. Right now we get. [ { "task" : "Backup" , "taskId" : "1" , "started" : "2022-10-11T16:25:47.083+00:00" } ] >From this I don't know which DB is being backed up. It is helpful if you have more tasks in progress to tell which one is done and which is in progress. Regarding backup program, I was checking out how autopostgresql-backup works to implement something similar. autopostgresql-backup works synchronously. This makes the logic is simple for autopostgresql-backup. On fuseki side, I need to know when the task is done. Since the tasks API is async my plan is to pool tasks api and check for db name. I can also use DB name + date from json reply to form the file name instead of parsing it. Let me know if you have other ideas on how this should be done.
Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?
Done https://github.com/apache/jena/issues/1500 . Thanks. Will see if I have time to contribute a solution. But I am busy in the next week or so. If anyone is interested in providing a fix, please let me know. Regards, Eugen On 30.08.2022 17:13, Andy Seaborne wrote: On 30/08/2022 12:17, Eugen Stan wrote: Hi Andy, Thanks for the feedback. I think we are in agreement. Nice touch with cleanup on server startup :). Should I raise a JIRA issue for the server side bits? Yes please, or a github issue (we use both) https://github.com/apache/jena/issues (The codebase already has some "safe write" code in IOX.safeWrite) Andy I will setup the backup script as separate git repo. Thanks, Eugen On 30.08.2022 13:02, Andy Seaborne wrote: Hi Eugen, Yes, the backup should be written then atomically moved (i.e. same directory). Cleanup would then be "delete" by pattern in the server startup script. As to putting a process script around the functionality, it is an external script which needs access to the server file area (to know the state of backups). The file system state is the definitive state - not the jobs (that's a UI feature). This would make a good independent project or contribution. Or published example as a starting point because the requirements will be depend on the deployment environment and it seems unlikely to me that there is a one size fits all. Fuseki should make sure it has the right behaviours (like atomic write). Andy autopostgresqlbackup itself is GPL. On 29/08/2022 11:20, Eugen Stan wrote: Hello, We are using fuseki and we would like to implement a backup policy similar in capabilities to what [autopostgresqlbackup] has to offer. Are there any existing solutions out there that can do all / part of these? We would like to take: * daily backups for a week * weekly backups - 1 per week, last 4 weeks * monthly backups - 1/ month, last 6 months I believe this could be scripted with via the HTTP API + directory access. The backup api in [fuseki-server-protocol] can trigger a backup and can also list existing backups. Unfortunately in the current implementation, backup is not consistent. In case of a server crash during backup, the file will remain there incomplete. Also, since tasks are stored in memory and cleaned (periodically / on restart) there is no way to know for sure if the backup was successful or not. In have encountered the above quite often in some workloads. The in-consistency could be solved by writing the backup to temporary file name and on completion, renaming it to final file name. Rename is usually atomic operation on POSIX file systems. /backup-list API can list all backups or split backups in complete / incomplete. IMO for now, it can list all of them. The in progress backup could be stored alongside the other backups with a file marker like: dataset_date.nq.gz.INCOMPLETE . Once it's done it can be renamed to dataset_date.nq.gz . Cleanup might be handled externally. In case of a crash, the file will remain INCOMPLETE until it is removed by system by checking a specific amount of time has passed since backup was started (1-2 days). WDYT? [autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup [fuseki-server-protocol] https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html Thanks, z -- Eugen Stan +40770 941 271 / https://www.netdava.com begin:vcard fn:Eugen Stan n:Stan;Eugen email;internet:eugen.s...@netdava.com tel;cell:+40720898747 x-mozilla-html:FALSE url:https://www.netdava.com version:2.1 end:vcard
Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?
On 30/08/2022 12:17, Eugen Stan wrote: Hi Andy, Thanks for the feedback. I think we are in agreement. Nice touch with cleanup on server startup :). Should I raise a JIRA issue for the server side bits? Yes please, or a github issue (we use both) https://github.com/apache/jena/issues (The codebase already has some "safe write" code in IOX.safeWrite) Andy I will setup the backup script as separate git repo. Thanks, Eugen On 30.08.2022 13:02, Andy Seaborne wrote: Hi Eugen, Yes, the backup should be written then atomically moved (i.e. same directory). Cleanup would then be "delete" by pattern in the server startup script. As to putting a process script around the functionality, it is an external script which needs access to the server file area (to know the state of backups). The file system state is the definitive state - not the jobs (that's a UI feature). This would make a good independent project or contribution. Or published example as a starting point because the requirements will be depend on the deployment environment and it seems unlikely to me that there is a one size fits all. Fuseki should make sure it has the right behaviours (like atomic write). Andy autopostgresqlbackup itself is GPL. On 29/08/2022 11:20, Eugen Stan wrote: Hello, We are using fuseki and we would like to implement a backup policy similar in capabilities to what [autopostgresqlbackup] has to offer. Are there any existing solutions out there that can do all / part of these? We would like to take: * daily backups for a week * weekly backups - 1 per week, last 4 weeks * monthly backups - 1/ month, last 6 months I believe this could be scripted with via the HTTP API + directory access. The backup api in [fuseki-server-protocol] can trigger a backup and can also list existing backups. Unfortunately in the current implementation, backup is not consistent. In case of a server crash during backup, the file will remain there incomplete. Also, since tasks are stored in memory and cleaned (periodically / on restart) there is no way to know for sure if the backup was successful or not. In have encountered the above quite often in some workloads. The in-consistency could be solved by writing the backup to temporary file name and on completion, renaming it to final file name. Rename is usually atomic operation on POSIX file systems. /backup-list API can list all backups or split backups in complete / incomplete. IMO for now, it can list all of them. The in progress backup could be stored alongside the other backups with a file marker like: dataset_date.nq.gz.INCOMPLETE . Once it's done it can be renamed to dataset_date.nq.gz . Cleanup might be handled externally. In case of a crash, the file will remain INCOMPLETE until it is removed by system by checking a specific amount of time has passed since backup was started (1-2 days). WDYT? [autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup [fuseki-server-protocol] https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html Thanks, z
Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?
Hi Andy, Thanks for the feedback. I think we are in agreement. Nice touch with cleanup on server startup :). Should I raise a JIRA issue for the server side bits? I will setup the backup script as separate git repo. Thanks, Eugen On 30.08.2022 13:02, Andy Seaborne wrote: Hi Eugen, Yes, the backup should be written then atomically moved (i.e. same directory). Cleanup would then be "delete" by pattern in the server startup script. As to putting a process script around the functionality, it is an external script which needs access to the server file area (to know the state of backups). The file system state is the definitive state - not the jobs (that's a UI feature). This would make a good independent project or contribution. Or published example as a starting point because the requirements will be depend on the deployment environment and it seems unlikely to me that there is a one size fits all. Fuseki should make sure it has the right behaviours (like atomic write). Andy autopostgresqlbackup itself is GPL. On 29/08/2022 11:20, Eugen Stan wrote: Hello, We are using fuseki and we would like to implement a backup policy similar in capabilities to what [autopostgresqlbackup] has to offer. Are there any existing solutions out there that can do all / part of these? We would like to take: * daily backups for a week * weekly backups - 1 per week, last 4 weeks * monthly backups - 1/ month, last 6 months I believe this could be scripted with via the HTTP API + directory access. The backup api in [fuseki-server-protocol] can trigger a backup and can also list existing backups. Unfortunately in the current implementation, backup is not consistent. In case of a server crash during backup, the file will remain there incomplete. Also, since tasks are stored in memory and cleaned (periodically / on restart) there is no way to know for sure if the backup was successful or not. In have encountered the above quite often in some workloads. The in-consistency could be solved by writing the backup to temporary file name and on completion, renaming it to final file name. Rename is usually atomic operation on POSIX file systems. /backup-list API can list all backups or split backups in complete / incomplete. IMO for now, it can list all of them. The in progress backup could be stored alongside the other backups with a file marker like: dataset_date.nq.gz.INCOMPLETE . Once it's done it can be renamed to dataset_date.nq.gz . Cleanup might be handled externally. In case of a crash, the file will remain INCOMPLETE until it is removed by system by checking a specific amount of time has passed since backup was started (1-2 days). WDYT? [autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup [fuseki-server-protocol] https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html Thanks, z -- Eugen Stan +40770 941 271 / https://www.netdava.com begin:vcard fn:Eugen Stan n:Stan;Eugen email;internet:eugen.s...@netdava.com tel;cell:+40720898747 x-mozilla-html:FALSE url:https://www.netdava.com version:2.1 end:vcard
Re: fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?
Hi Eugen, Yes, the backup should be written then atomically moved (i.e. same directory). Cleanup would then be "delete" by pattern in the server startup script. As to putting a process script around the functionality, it is an external script which needs access to the server file area (to know the state of backups). The file system state is the definitive state - not the jobs (that's a UI feature). This would make a good independent project or contribution. Or published example as a starting point because the requirements will be depend on the deployment environment and it seems unlikely to me that there is a one size fits all. Fuseki should make sure it has the right behaviours (like atomic write). Andy autopostgresqlbackup itself is GPL. On 29/08/2022 11:20, Eugen Stan wrote: Hello, We are using fuseki and we would like to implement a backup policy similar in capabilities to what [autopostgresqlbackup] has to offer. Are there any existing solutions out there that can do all / part of these? We would like to take: * daily backups for a week * weekly backups - 1 per week, last 4 weeks * monthly backups - 1/ month, last 6 months I believe this could be scripted with via the HTTP API + directory access. The backup api in [fuseki-server-protocol] can trigger a backup and can also list existing backups. Unfortunately in the current implementation, backup is not consistent. In case of a server crash during backup, the file will remain there incomplete. Also, since tasks are stored in memory and cleaned (periodically / on restart) there is no way to know for sure if the backup was successful or not. In have encountered the above quite often in some workloads. The in-consistency could be solved by writing the backup to temporary file name and on completion, renaming it to final file name. Rename is usually atomic operation on POSIX file systems. /backup-list API can list all backups or split backups in complete / incomplete. IMO for now, it can list all of them. The in progress backup could be stored alongside the other backups with a file marker like: dataset_date.nq.gz.INCOMPLETE . Once it's done it can be renamed to dataset_date.nq.gz . Cleanup might be handled externally. In case of a crash, the file will remain INCOMPLETE until it is removed by system by checking a specific amount of time has passed since backup was started (1-2 days). WDYT? [autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup [fuseki-server-protocol] https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html Thanks,
fuseki backup process / policy - similar capabilities to autopostgresqlbackup ?
Hello, We are using fuseki and we would like to implement a backup policy similar in capabilities to what [autopostgresqlbackup] has to offer. Are there any existing solutions out there that can do all / part of these? We would like to take: * daily backups for a week * weekly backups - 1 per week, last 4 weeks * monthly backups - 1/ month, last 6 months I believe this could be scripted with via the HTTP API + directory access. The backup api in [fuseki-server-protocol] can trigger a backup and can also list existing backups. Unfortunately in the current implementation, backup is not consistent. In case of a server crash during backup, the file will remain there incomplete. Also, since tasks are stored in memory and cleaned (periodically / on restart) there is no way to know for sure if the backup was successful or not. In have encountered the above quite often in some workloads. The in-consistency could be solved by writing the backup to temporary file name and on completion, renaming it to final file name. Rename is usually atomic operation on POSIX file systems. /backup-list API can list all backups or split backups in complete / incomplete. IMO for now, it can list all of them. The in progress backup could be stored alongside the other backups with a file marker like: dataset_date.nq.gz.INCOMPLETE . Once it's done it can be renamed to dataset_date.nq.gz . Cleanup might be handled externally. In case of a crash, the file will remain INCOMPLETE until it is removed by system by checking a specific amount of time has passed since backup was started (1-2 days). WDYT? [autopostgresqlbackup] https://github.com/k0lter/autopostgresqlbackup [fuseki-server-protocol] https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html Thanks, -- Eugen Stan +40770 941 271 / https://www.netdava.combegin:vcard fn:Eugen Stan n:Stan;Eugen email;internet:eugen.s...@netdava.com tel;cell:+40720898747 x-mozilla-html:FALSE url:https://www.netdava.com version:2.1 end:vcard