On Tue, Jun 7, 2011 at 11:18 PM, John Arbash Meinel <j...@arbash-meinel.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > ... >> Its doable, but AFAIK: >> - none of the Canonical deployments have this aspect live >> - its susceptible to split brain fail >> >> So I think we'd need to invest considerably more resources to get a >> resilient HA rabbit. We may want to do that in the medium term, but >> /many/ of our initial use cases for rabbit are primarily event >> raising. So I think we can get some early benefit, and make per-case >> risk assessments for use of its persistence features in the short >> term. >> >> Anecdata: twitter, who run kestrel as their queueing system simply >> design their code to gracefully deal with a queue server going awol >> (be that crash, boom, whatever). >> >> -Rob > > How much of HA is because you expect Rabbit to die, and how much of HA > is because you want a way to deploy without taking down the whole > system? Clustering seems like it would handle the second case. One > node's queue is temporarily offline until it is brought back up, but the > other nodes keep serving. And if you stop accepting new entries while > you are shutting down, then you never have any messages delayed.
Rabbit does not run active-active ever. So you can't keep serving while one node is down: you have to fail over, which means degraded service (at best) during the failover process (several seconds at least from what I can tell). > If it is that you want to plan for Rabbit (or the machine it is running > on) to fail non-deterministically, then certainly you need different > security guarantees. > > However, isn't the current Postgres master a "if it goes down we all go > down for a while" setup? Isn't that machine pretty reliable overall? (It > certainly also suffers from "we can't softly shut-down for upgrades", > but it seems like the non-deterministic failures are pretty reasonable.) I would like to fix the postgresql one too; at the moment the way we work with it - due to its design around clustering and schema changes - is to change things once a month, which drives latency for feature work and performance work - we're *just now* landing a change we could have had out there for 3 weeks, if we didn't have a 4 week cycle. Postgresql having defects in this area isn't a reason to bring in other like defects in new components :) -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp