Mikhail Loenko wrote:
2006/12/18, Geir Magnusson Jr. <[EMAIL PROTECTED]>:
Mikhail Loenko wrote:
> 2006/12/1, Geir Magnusson Jr. <[EMAIL PROTECTED]>:
>>
>>
>> Mikhail Loenko wrote:
>> > 4) We have cruise controls running classlibrary tests on DRLVM. We
>> > need to decide what will we do when DRLVM+Classlib cruise control
>> > reports failure.
>>
>> Stop and fix the problem. Is there really a question here? I agree
>
> Yes, there is a question here. "Stop and fix" includes "discuss". But
> as we now know discussion may take several days. And while some people
> discuss what the problem is other people can't proceed with
> development and patch
> intagration.
>
> To have better pace and better CC up-time we need something else but
not
> just "stop and fix". I suggest "revert and continue"
What's the difference, other than debating the semantics of "fix" and
"revert"?
We all agree - but I still don't think you're clearly stating the
problem. I think that the core problem is that we don't immediately
react to CC failure.
Immediately reacting to CC failure should be the first order of the day
here. Reacting to me is making the decision, quickly, about either
rolling back the change ("reverting") or doing something else. The key
is being responsive.
It seems that what happens is that we wait, and then sets of changes
pile up, and I think that doing mass rollbacks at that point will solve
it, but make a mess.
The example of what I envision is when I broke the build in DRLVM,
Gregory told me immediately, and I fixed immediately - w/o a rollback.
All I'm saying is :
1) We need to be far better with reaction time
I would say we need to be far better with fixing/reverting time.
If we reacted immediately and than discussed for two weeks -- we would not
be better than where we are now
Yes, fixing/reverting is included. It's what I meant.
2) We have intelligent people - we can be agile in this by making
decisions (quickly!) on a case by case basis what to do.
I'll also suggest that we ask each committer to check the CC event
stream before committing, so you don't commit into a bad state of things.
One of my problems is that I don't trust the CC stream, and don't
clearly see it because it's mixed in the other drek of the commits@ list.
The problem is intermittent failures. I suggest that we exclude graphics
tests
from CCs and probably have CC-specific exclude lists for networking tests
(or fix all the known intermittent failures right now :)
good idea - works for me.
We need to drive into stability - we've made amazing progress in the
last two months, and now we're down to the really, really hard stuff. I
think that excluding them to get rock-solid CC reporting is step 0,
and then step 1 is try and grind out the intermittent failures.
geir