Re: How could we increase the availability of a Fuseki installation?

Thorsten Möller Fri, 09 Sep 2011 06:29:40 -0700

Am 09.09.2011 um 14:55 schrieb Andy Seaborne:
> 
> On 09/09/11 13:06, Thorsten Möller wrote:
>> 
>> Am 09.09.2011 um 00:20 schrieb Paolo Castagna:
>> 
>>> Hi, one feature I miss from Fuseki (and probably everybody who
>>> wants to use Fuseki to run SPARQL endpoints, public or private) is
>>> "high availability".
>>> 
>>> One way to increase the availability of your SPARQL endpoints is to
>>> use "replication": you serve the same data with multiple machines
>>> and you put a load balancer in front of it, distributing queries to
>>> those machines.
>>> 
>>> I'd like something relatively easy to setup, close to zero admin
>>> costs at runtime and which would make extremely easy to provision a
>>> new replica if necessary. I'd like something so easy, I could
>>> implement it in a couple of days.
>>> 
>>> Here are some thoughts (inspired by Solr master/slave architecture
>>> and replication)...
>>> 
>>> One master, N slaves. All running exactly the same Fuseki code
>>> base. All writes go to the master. If the master is down, no writes
>>> are possible (this is the price to pay in exchange of extreme
>>> simplicity). All reads go to the N slaves (+ master?). Masters
>>> might be out of synch between each others (this is the price to pay
>>> in exchange of extreme simplicity). Each slave every once in a
>>> while (configurable) will contact the master to check if it has
>>> received updates since the last time the slave has checked. An
>>> additional servlet needs to be added to Fuseki to co-ordinate
>>> replication. Replication happens simply by running rsync (from
>>> Java), from the master pushing out stuff onto a remote slave.
>>> During replication the master must acquire an exclusive write lock,
>>> sync TDB indexes on disk, rsync with slave, release lock when
>>> finished or after a timeout long enough for a first sync to be
>>> successful. Slaves need to drop eventual TDB caches after a rsync.
>>> 
>>> The things I like about this approach are:
>>> 
>>> - it's very simple - after the first time, rsync is incremental and
>>> quite fast - provision a new slave is trivial - it could be
>>> implemented relatively quickly - if the master receives an update
>>> while one or more slaves are down, slaves will catch up later
>>> 
>>> The things I do not like are:
>>> 
>>> - the master is a SPOF for writes - you need an external load
>>> balancer to send requests to the slave(s) - slaves can be out of
>>> sync for a period of time - can a slave answer queries during the
>>> replication with rsync?
>>> 
>>> What do you think?
>> 
>> That's all well thought out, but it's more of an approach to
>> scalability than availability. The reason is that the single
>> write-only master remains a SPOF; hence, the availability does not
>> change regarding writes. Of course, one could assume that the master
>> has 100% availability but then there is no need to do something about
>> availability.
> 
> True, but.
> 
> Make the front-end coordinator a simple router of requests.  It tracks the 
> state of the cluster but is not itself a database node.
> 
> This greatly reduces the chance of the SPOF actually failing.
> 
> To really remove a SPOF, replicate the coordinator, use zookeeper (or etc) to 
> assist in tracking the version state - there is a cost of coordination.
> 
> One greatly simplifying assumption is to design for a datacentre, not for 
> geographic distribution.


Just would like to add that designing for geographic distribution is somehow 
against the SemWeb vision: it's linked data, not distributed data (but clients 
may aggregate).

Thorsten








> The partition problem can simplifed.
> 
> Below ...
> 
>> 
>> 
>>> I don't have an alternative who would be so simple to implement and
>>> which would increase the availability (in particular for reads) of
>>> a bunch of machines running Fuseki.
>> 
>> Replication is inherently a tradeoff between consistency and
>> performance; recap the famous CAP theorem [1].
>> 
>> 
>>> Replication is good at increasing your throughput in terms of
>>> queries/sec and it can have minor effects in reducing query
>>> response time because you have less concurrent queries per
>>> machine.
>>> 
>>> Caching will also be very useful and this is, by no means, a
>>> substitute for that. We would love to have both! The two things can
>>> and IMHO should coexist. In my opinion, availability should come
>>> first and caching soon after. The problem with caching first with
>>> no availability is that you have a problem with cache misses. Your
>>> users don't get their responses back.
>>> 
>>> Another approach would be to look at the Journal in TxTDB and see
>>> if we could have a Journal which would replicate changes with
>>> remote machines.
>>> 
>>> Another alternative would be to submit a SPARQL Update or a write
>>> request to each of the replicas.
>> 
>> This seems to be a very interesting alternative since it would
>> basically blow away all data replication. Contemplating a little
>> more, a small update query can lead to the need to replicate the
>> entire data in worst case (e.g., delete everything), whereas costs
>> for replicating a query of a few hundred bytes are low. In other
>> words, replicating queries does not differ much in the costs whereas
>> costs for replicating the data that was changed as a result of a
>> query can differ dramatically. If you further assume eventual
>> consistency [2] (i.e., all nodes eventually have the same state of
>> the data) then it would also not matter how fast all the nodes are in
>> processing the queries. You would "only" need global update query
>> ordering in order to ensure that all nodes eventually have the same
>> state, otherwise, if nodes might receive update queries in different
>> order, nodes diverge in the state of the data over time. Note that
>> readers might, of course, see different results depending on which
>> node they query - that's the price of eventual consistency.
> 
> What is more, having quorum control for read and write is a way to allow for 
> different characteristics and for partition tolerance.
> 
> If N = number of machines, R = read quorum, W write quorum, then
> 
> R+W > N is consistency and majority partition wins
> R+W <= N is eventual consistency, and the need to worry
>       about partition recombination.
> 
> Choose R and W to meet the specific service needs.
> 
> e.g. R=1 to maximise query throughput.
> 
>       Andy
> 
> 
>> 
>> Thorsten
>> 
>> 
>> [1] http://en.wikipedia.org/wiki/CAP_theorem [2]
>> http://en.wikipedia.org/wiki/Eventual_consistency
>> 
>> 
>> 
>> 
>>> In this case, what do we do if one of the replica is down?
>>> 
>>> Other alternatives?
>>> 
>>> Is this something you need or you would use?
>>> 
>>> Paolo
>>

Re: How could we increase the availability of a Fuseki installation?

Reply via email to