Re: [Launchpad-dev] velocity: parallel testing or simplified merge machinery first

Gary Poster Tue, 08 Feb 2011 05:05:05 -0800

On Feb 7, 2011, at 9:32 PM, Robert Collins wrote:

> On Sat, Feb 5, 2011 at 12:10 AM, Gavin Panella
> <[email protected]> wrote:
>> On 4 February 2011 04:28, Robert Collins <[email protected]> wrote:
>>> I'm wondering if folk have a particularly strong opinion (and
>>> rationale :P) for which we should do first. They are *both* partly
>>> implemented, and *both* are likely to have long tails leading to
>>> niggly bits to sort out over some weeks.
>> 
>> My gut feeling is that velocity is hurt most when:
>> 
>> 1. Branches get lost in ec2, especially when there's no message to
>>   tell me or anyone else about it. I might not notice anything the
>>   matter until the following day.
> 
> SMM will indeed help with this, but its extremely rare isn't it?
> Certainly on an individual basis that would stall.


Actually, I'm not entirely clear how SMM would help with this.  My picture of 
SMM includes people usually continuing to run the test suite locally.

> 
>> 2. Branches get bounced out of pqm. Again, this is exacerbated when
>>   there is no message to tell anyone about it. There's also sometimes
>>   a need to work with a LOSA to figure out what the reason was.
> 
> This is RT 43883 which I've just filed; we really need to get this
> /fixed/ and stop having half-stabs at it. I've asked Francis to give
> it pri 90 - zomg. Its really affecting developers a lot.

To be clear, that now-fixed RT is about fixing the silence of the bounces: yay, 
and thank you!

However, Gavin's #2 is still very pertinent: testfix mode bounces branches 
after a failed test run, by definition.  The SMM idea bounces the branch that 
failed tests, and any branches that were unfortunate enough to be run 
simultaneously, but subsequent branch landings are unaffected.  That's the 
heart of the change.  An intended side-effect is that it also drastically 
simplifies the collection of landing machinery we have.

I haven't commented on this thread before, so I'll collect a few additional 
thoughts here.

= TDD =

To address Aaron's comments, TDD can be done with a small subset of the test 
suite, and making that loop faster would mean reducing the time it takes to run 
one test or a few tests--by reducing the time it takes to start up their 
layers, say.  The parallel test initiative would not help there, as far as I 
can tell.

= Landing machinery vs. Parallel test suites =

I think fixing our landing machinery is a better goal than parallel test 
suites.  The pain I experience, and that my team reports, is tied up with 
landing issues such as testfix mode.  

That said, SMM is one approach to that goal.  If "parallel test suites" were 
recast as "fix our landing machinery by introducing parallel test suites of < 1 
hour and PQM as it was before, with one branch at a time" (as you proposed) 
then I'd be very interested. Importantly, success on that effort would not have 
been achieved until the landing machinery were improved, to eliminate testfix 
mode and show that landing branches takes less time on average than now.

I think it would be worth analyzing the technical merits of the two approaches. 
 To agree with Julian's mail, the parallel test run story feels much riskier 
technically, but that's one person's (well, two people's ;-) ) observation of 
one aspect of the decision.  On the other side, solving the problem with 
parallel test suites  and single-branch PQM runs *should* reduce or eliminate 
the need for the separate ec2 test pre-runs, which would be a huge win.  The 
risk/reward balance might lean away from SMM, even with greater risk for 
parallel test runs.  Happily, that's not my call.

To repeat and summarize, the *problem to be solved* IMO and in the opinion of 
most other people on this thread is to make our landing story better.

= State of SMM =

If we do go down the road of SMM, I have some technical thoughts about the 
current state of that effort.  I've shared them with Francis before, so they 
should come as no surprise to him, but I haven't spoken more publicly.  I'll 
summarize here.

 - The Foundations effort was largely aiming for a proof of concept that 
Foundations could maintain and improve while it was running. That would not be 
the goal of a feature squad.  This means that the squad would have to expend 
more effort on it that Foundations would have initially.  It also would 
probably mean that the end result would be nicer.
 - Tarmac was not designed for what we needed and getting it ready for that 
functionality was more contentious and problematic than we expected.  These 
issues are still not resolved.
 - Francis and I agreed that other systems, like Hudson/Jenkins, *might* be 
elegantly extensible enough to handle what we need themselves.  In that case,  
the "one running piece of software" would be Hudson/Jenkins with our 
extensions, rather than tarmac or PQM.  The question would be, what baseline 
functionality do we want to build off of?  Tarmac ended up not bringing up much 
to the table for this particular problem, other than Paul's energy and 
interest, which admittedly is very nice to have.  Hudson/Jenkins would at least 
bring visibility to the test runs, which would be very nice, and is missing 
from Tarmac AFAIK.

= Summary =

Parallel test runs are a means to an end.  If the end we strive for is 
significantly increasing the speed and reliability of our landing, that's a 
potentially compelling argument to me.  SMM approaches the right problem, IMO, 
but still has some work left to it, and might not be as nice as a parallel test 
run solution *for the landing problem*.  That said, parallel test runs have 
been problematic in past attempts.

Gary
_______________________________________________
Mailing list: https://launchpad.net/~launchpad-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp

Re: [Launchpad-dev] velocity: parallel testing or simplified merge machinery first

Reply via email to