On Tue, Jun 7, 2011 at 6:43 AM, Julian Edwards <julian.edwa...@canonical.com> wrote: > On Friday 03 June 2011 03:33:13 Robert Collins wrote: >> Rabbit has an awkward high availability story; specifically its not >> trivial to get the reliability we have out of HTTP services, This is >> partly because rabbit clusters don't distribute the queues and because >> its a more stateful and complex system than HTTP. Long story short we >> won't be in a position to use queues for persistence and its simpler >> to use HTTP to gracefully handle a single backend node dying. > > This makes me sad :/ Queues are massively more useful if they are persistent > and this is one aspect that I was really looking forward to working with. > There's ways around it of course, but it makes things more awkward for the > consumer. > > Presumably you've been looking at http://www.rabbitmq.com/pacemaker.html ? > I've had a quick glance but not digested anything.
Indeed. Basically you run a watchdog that notes that rabbit is down and fires up rabbit on a separate node using the same shared disk (e.g. DRBD, OCFS2 etc) and the same node id, you do ip address handovers .. shudder. Its doable, but AFAIK: - none of the Canonical deployments have this aspect live - its susceptible to split brain fail So I think we'd need to invest considerably more resources to get a resilient HA rabbit. We may want to do that in the medium term, but /many/ of our initial use cases for rabbit are primarily event raising. So I think we can get some early benefit, and make per-case risk assessments for use of its persistence features in the short term. Anecdata: twitter, who run kestrel as their queueing system simply design their code to gracefully deal with a queue server going awol (be that crash, boom, whatever). -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp