Hi, one thing often people do to increase availability is to use replication. Having replicated RDF stores allows you to load balance requests across multiple machines in order to increase query throughput as well. Replicas can act as sort of backup, however you need to be careful since errors will be replicated as well, therefore replication, in general, does not eliminate the need of backups. (What's the best way to backup a TDB store?)
However, in presence of updates you are left with the problem to keep your replicas in sync. For simplicity, I was thinking to try something with a master/slave(s) architecture. One Fuseki server acting as master and running with the --update option and a few replicas running in read only mode. I am thinking to use rsync to sync TDB indexes between master and slaves. However, there is the need to coordinate the replication and forbid any updates while the replication is in progress. Master should become read only, sync everything on disk and then replication could start. A similar thing would need to happen for slaves. Slaves should probably be taken off-line while replication is going on and Fuseki restarted as soon as replication finishes. I expect replication to be quite fast, after the first time. I am sending this email to validate my thinking and to ask if anyone else has used rsync to manage replicated TDB stores with/without Fuseki. Thank you, Paolo
