On Tue, Jun 7, 2011 at 9:48 AM, Jamu Kakar <jka...@kakar.ca> wrote: > Hi, >> I'm not sure what we should do queue wise; I'm inclined to stick with >> Rabbit until its either too much bother, or something massively better >> (e.g. massively simpler with sameish facilities, or similar complexity >> and more facilities (like HA)) comes along. > > I can't help but wonder if skipping RabbitMQ for the reasons above is > going too far. High availability is important, especially for the > plumbing that connects everything together, but I wonder how much of > it you really need? It seems the benefits of a queue based model are > many and that using HTTP or protobufs will result in an architecture > that, while it may be (more) highly available, will involve all kinds > of other deployment, design and performance hassles.
So, for clarity, the reasons to *consider* skipping rabbit are: * its a PITA to bring up reliably in a test environment. * something else functionally equivalent but simpler comes along * something with more facilities and same complexity comes along Those seem like pretty darn good reasons to consider skipping *any* commodity item. > That said, I don't have enough experience with RabbitMQ to say, > "you're thinking about this wrong", or conversely, "yes, this is a > serious issue". I am slightly concerned that overengineering could > lead to a suboptimal solution. The impression I get from the outside, > and maybe I'm totally off the mark, is that Launchpad often chooses a > hard path to Do Things Right(tm), and then the end result is that > everything is hard. > > I also wonder how many of these services need to talk to each other. > Maybe you could run many RabbitMQ instances and use them for > particular tasks? For example, a bug-focused queue for bug-related > operations, a code hosting-focused queue for code-related operations, > etc. If one of them falls over you end up with degraded service, as > opposed to losing everything. I don't know how viable that is, since > I don't really understand what the topology of micro-services will > look like. > Also, is there something that will solve the HA issues you've brought > up in the pipeline for RabbitMQ? Maybe it's something worth > contributing to and/or living without for some time while support for > these issues gets baked in? > > How do other people use RabbitMQ and sleep at night? Those are good questions to ask. On the HTTP vs Rabbit space, I think the decoupling between service point and implementation is a useful thing to have, but if you look at the list of things we need in place to consider a microservice maintainable - https://dev.launchpad.net/ArchitectureGuide/ServicesRequirements - most of those are not impacted by changing the protocol from HTTP+foo to amqp. Launchpad has a history of awkard implementation decisions - yes thats true. However I think many of them are due to the complexity of analyzing scaling and performance (consider - predict which bottleneck will we hit next in codehosting: CPU? memory? network bandwidth to the main host? disk space? fs locks? concurrent IO rate to disks?...) and then go back 6 years and predict which design will handle all the bottlenecks gracefully. It would be easy to throw stones, but we get 20-20 vision in hindsight. I think that the folk (which includes me for some decisions - waaaay back :)) did their best to analyse things at the time. However I think they over-analysed: many problems our past selves designed for did not occur, and many problems they did not design for have occurred. So, I want us to simultaneously: - be able to diagnose problems /fast/ - be able to recover from operational issues rapidly - look after our users data - be able to modify the design rapidly to deal with the things we have not designed for. - have the lowest implementation cost to meet these four things To that end, saying 'lets start with rabbit without using its persistence features': - lets us leverage the ops team familiarity with rabbit for diagnosis, logging, capacity planning - and their experience with it for recovering after it breaks - avoids concerns about data integrity or storage - can be modified easily to permit persistence (add HA) or to move to a less cumbersome implementation - looks pretty cheap to do (we have cookie-cutter deployment knowledge for http stacks). -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp