Francis has already covered most of this, but I felt there is a small point that should be expanded on.
On Tue, Dec 14, 2010 at 3:28 PM, Max Bowsher <[email protected]> wrote: > Including a "The new way to search bugs just within the Soyuz component > of Launchpad is ..." (etc.) in the blog post would likely make it a lot > less likely to make people think negatively about the change. Max, I'm really very very excited by the change we're doing here, and I'd like to try and bring all the bits together : I think its entirely positive, but if there are downsides or issues we're going to cause folk, we should address them clearly and openly. The problem with including a 'new way to ...soyuz' statement in the blog post is that it incorrectly presumes that there is a correct 'old way'. There isn't, and here is why.... the existing policy for where a bug should filed is not *where the problem is visible* but instead *what part of the code base the change needs to happen in*. Note that this requires prescience : if we don't know what needs to change, there is no clear place to put any given bug at the moment. The triage job that CHR does is complicated by trying to guess *at the fix* when a bug is filed. Because of this it is extremely likely that any bug searches users have been doing in Launchpads bugs have been on the wrong place. There is, in fact, only one right place to search for Launchpad bugs and be confident you will find existing ones - https://bugs.launchpad.net/launchpad-project. Bugs that affect the 'Soyuz component' are currently found in: - soyuz - registry - foundations - code - translations - web - buildd As an example, a bug where bugtask changes timeout sending email is currently in 'foundations'. Why? because the issue might be due to mailserver performance/the serial nature of our mail handler. The subcomponent approach for bug tracking makes some sense when you think of Launchpad as N parallel applications with one team maintaining each application : developers need to know that the bug is in their section of the application in order to pick it up and fix it. But tackling bug triage of Launchpad that way implies a very static partitioning of Launchpad (which puts up barriers), and also means that we have to resource each 'application' in advance by having a dedicated sub-team. The very nature of having dedicated teams means that each thing gets its own work queue, which adds latency to fixing problems (LEAN argues for having as few queues as possible). This structure also means that having more folk work on areas that are in trouble becomes an exception rather than the rule (because folk are pulling from a per-team queue. And that means that its not uncommon for a bug that is project-wide high importance stalls when it moves from one teams region of maintenance to a smaller or busier teams area. The new bug tracking structure is only the surface exposure of a more fundamental change: rather than having strictly defined regions of the code base, we're moving to a whole-project ownership model with squads responsible for getting things done rather than regions of the code base. Each squad will be a small team, of a size that can work well together on a single project, timezone compatible, and ideally have a good spread of the skills that go into making Launchpad changes: Javascript, UI, Zope, Postgresql. The squads then are jointly responsible for the entire Launchpad project. If we split existing code into two - refactoring for maintenance, we don't need to add a squad to cater for that. And vice versa when combining components makes sense. One of the existing things we have trouble with is handling interrupts *and* doing project work. Teams that are both component maintainers and doing projects tend to let interrupts(bug reports, timeouts, ooses) fall by the wayside until their big project is done. This is natural because doing big projects is hard and needs concentration, and by being sole-owners of parts of the codebase while the team is focused on the project, noone else is doing the interrupt work. A very nice thing about the squads approach is that at any point in time a given squad will be just doing interrupts, or just doing a big project. Squads will get furlough from the heavy lifting involved in project work. Something like project, interrupts, project, interrupts. This is much more flexible too - if we are drowning in bugs, Francis can simply not assign a big project to the next squad when an existing project is finished. Conversely if we need to do more project work in parallel, he can ask a squad to come out of maintenance mode early. Now, there is a bit of a tradeoff here, we're changing from very focused teams with deep domain knowledge to project wide teams with deep stack knowledge. (Rather than a team that knows all about (picking an arbitrary one) bazaar, but isn't expected to know about all the layers in our environment, we have a squad thats expected to know all the layers and may not know all about bazaar). This means that we'll pay a context switch for the members of a squad when they go from working on an issue in bazaar.launchpad.net to an issue in answers.launchpad.net. OTOH we are eliminating massive cross team queues, and getting many more eyeballs on the code - in extreme cases 10 times the number of folk will be responsible for code that previously was all-but-orphaned. Past a ramping up phase, we're hoping to balance interrupts and projects /much/ better, which should let project work advance more quickly. And on the bug triage side, we will have removed the tension between *where the fix goes*, *where the symptoms are* and *how important the bug is*. Which is a huge win. -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

