Am 09.09.2011 um 14:55 schrieb Andy Seaborne: > > On 09/09/11 13:06, Thorsten Möller wrote: >> >> Am 09.09.2011 um 00:20 schrieb Paolo Castagna: >> >>> Hi, one feature I miss from Fuseki (and probably everybody who >>> wants to use Fuseki to run SPARQL endpoints, public or private) is >>> "high availability". >>> >>> One way to increase the availability of your SPARQL endpoints is to >>> use "replication": you serve the same data with multiple machines >>> and you put a load balancer in front of it, distributing queries to >>> those machines. >>> >>> I'd like something relatively easy to setup, close to zero admin >>> costs at runtime and which would make extremely easy to provision a >>> new replica if necessary. I'd like something so easy, I could >>> implement it in a couple of days. >>> >>> Here are some thoughts (inspired by Solr master/slave architecture >>> and replication)... >>> >>> One master, N slaves. All running exactly the same Fuseki code >>> base. All writes go to the master. If the master is down, no writes >>> are possible (this is the price to pay in exchange of extreme >>> simplicity). All reads go to the N slaves (+ master?). Masters >>> might be out of synch between each others (this is the price to pay >>> in exchange of extreme simplicity). Each slave every once in a >>> while (configurable) will contact the master to check if it has >>> received updates since the last time the slave has checked. An >>> additional servlet needs to be added to Fuseki to co-ordinate >>> replication. Replication happens simply by running rsync (from >>> Java), from the master pushing out stuff onto a remote slave. >>> During replication the master must acquire an exclusive write lock, >>> sync TDB indexes on disk, rsync with slave, release lock when >>> finished or after a timeout long enough for a first sync to be >>> successful. Slaves need to drop eventual TDB caches after a rsync. >>> >>> The things I like about this approach are: >>> >>> - it's very simple - after the first time, rsync is incremental and >>> quite fast - provision a new slave is trivial - it could be >>> implemented relatively quickly - if the master receives an update >>> while one or more slaves are down, slaves will catch up later >>> >>> The things I do not like are: >>> >>> - the master is a SPOF for writes - you need an external load >>> balancer to send requests to the slave(s) - slaves can be out of >>> sync for a period of time - can a slave answer queries during the >>> replication with rsync? >>> >>> What do you think? >> >> That's all well thought out, but it's more of an approach to >> scalability than availability. The reason is that the single >> write-only master remains a SPOF; hence, the availability does not >> change regarding writes. Of course, one could assume that the master >> has 100% availability but then there is no need to do something about >> availability. > > True, but. > > Make the front-end coordinator a simple router of requests. It tracks the > state of the cluster but is not itself a database node. > > This greatly reduces the chance of the SPOF actually failing. > > To really remove a SPOF, replicate the coordinator, use zookeeper (or etc) to > assist in tracking the version state - there is a cost of coordination. > > One greatly simplifying assumption is to design for a datacentre, not for > geographic distribution.
Just would like to add that designing for geographic distribution is somehow against the SemWeb vision: it's linked data, not distributed data (but clients may aggregate). Thorsten > The partition problem can simplifed. > > Below ... > >> >> >>> I don't have an alternative who would be so simple to implement and >>> which would increase the availability (in particular for reads) of >>> a bunch of machines running Fuseki. >> >> Replication is inherently a tradeoff between consistency and >> performance; recap the famous CAP theorem [1]. >> >> >>> Replication is good at increasing your throughput in terms of >>> queries/sec and it can have minor effects in reducing query >>> response time because you have less concurrent queries per >>> machine. >>> >>> Caching will also be very useful and this is, by no means, a >>> substitute for that. We would love to have both! The two things can >>> and IMHO should coexist. In my opinion, availability should come >>> first and caching soon after. The problem with caching first with >>> no availability is that you have a problem with cache misses. Your >>> users don't get their responses back. >>> >>> Another approach would be to look at the Journal in TxTDB and see >>> if we could have a Journal which would replicate changes with >>> remote machines. >>> >>> Another alternative would be to submit a SPARQL Update or a write >>> request to each of the replicas. >> >> This seems to be a very interesting alternative since it would >> basically blow away all data replication. Contemplating a little >> more, a small update query can lead to the need to replicate the >> entire data in worst case (e.g., delete everything), whereas costs >> for replicating a query of a few hundred bytes are low. In other >> words, replicating queries does not differ much in the costs whereas >> costs for replicating the data that was changed as a result of a >> query can differ dramatically. If you further assume eventual >> consistency [2] (i.e., all nodes eventually have the same state of >> the data) then it would also not matter how fast all the nodes are in >> processing the queries. You would "only" need global update query >> ordering in order to ensure that all nodes eventually have the same >> state, otherwise, if nodes might receive update queries in different >> order, nodes diverge in the state of the data over time. Note that >> readers might, of course, see different results depending on which >> node they query - that's the price of eventual consistency. > > What is more, having quorum control for read and write is a way to allow for > different characteristics and for partition tolerance. > > If N = number of machines, R = read quorum, W write quorum, then > > R+W > N is consistency and majority partition wins > R+W <= N is eventual consistency, and the need to worry > about partition recombination. > > Choose R and W to meet the specific service needs. > > e.g. R=1 to maximise query throughput. > > Andy > > >> >> Thorsten >> >> >> [1] http://en.wikipedia.org/wiki/CAP_theorem [2] >> http://en.wikipedia.org/wiki/Eventual_consistency >> >> >> >> >>> In this case, what do we do if one of the replica is down? >>> >>> Other alternatives? >>> >>> Is this something you need or you would use? >>> >>> Paolo >>
