So, it turns out that the rabbit test fixture was more trouble than we thought.
We've now tried three, or is it four times to get it to be part of our test suite. Currently it sporadically fails to startup inside the test suite - the erlang OTP decides it doesn't want to play. We got our original test fixture code from U1 which runs up ephemeral servers as part of their test suite. Unlike use they start that from outside the test suite. So one possibility is that its the old 'python tramps all over SIGPIPE' behaviour tripping us up. We run rabbit from within python so that each worker in parallel test mode can get its own rabbit and not stomp on other tests. Landscape, another internal Canonical also use Rabbit, and they use the system Rabbit for their test suite. This seems undesirable to me because it means running the test suite depends on more local system configuration, which makes it harder to do on datacentre machines, as well as being more intrusive on dev machines. In a separate but related matter Elliot has promised to put me in touch with one of the Rabbit core devs that knows all about HA : apparently it is better than the docs say it is :). Anyhow, the current state is this: * rabbit is currently out of our tree * We've an unknown amount of work to do to get it working in tests reliably * We're still quite a way away from having production rabbit installs meet all of https://dev.launchpad.net/ArchitectureGuide/ServicesRequirements Now, its not really a Launchpad issue, but within Canonical we try quite hard to use the same infrastructure across different projects, so that skills and knowledge are transferable. That means that if we want to use a different mq to rabbit we need a reasonably compelling reason behind that: ideally one which other projects would agree with and eventually migrate. Now, when folk @ Canonical started deploying message queues, rabbitmq was basically 'it' - the 0mq schism came along later. As I see it we have a few questions to answer: - should we invest $unknown_time in chasing this sporadic failure down to ground - should we look at getting a rabbit expert to help us? - should we use rabbit? - and if not rabbit, what then [and what is compellingly different]? AIUI Julian has asked Gavin to stop pushing rabbit forward for now; that means that we're de facto not investing in fixing it at the moment. I don't know any deep-guru rabbit experts personally, but even if I did, the HA concerns really have mre questioning rabbit as our long term future. So perhaps we should wait to talk with this rabbit HA expert, and if the resulting story is still overly icky, look closely at e.g. 0mq as a simpler proposition with equal HA facilities (simpler to deploy, simpler to admin, simpler to test with). -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp