On Jan 29, 2012, at 10:23 AM, Georges Saab wrote:

>> 
>> I'm missing something. How can everybody using the exact same system scale 
>> to 100's of developers?
> 
> System = distributed build and test of OpenJDK

Ah ha...   I'm down in the trenches dealing with dozens of different OS's 
arch's variation machines.
You are speaking to a higher level, I need to crawl out of the basement.

> 
> Developers send in jobs 
> Jobs are distribute across a pool of (HW/OS) resources
> The resources may be divided into pools dedicated to different tasks 
> (RE/checkin/perf/stress)
> The pools are populated initially according to predictions of load and then 
> increased/rebalanced according to data on actual usage
> No assumptions made about what exists on the machine other than HW/OS
> The build and test tasks are self sufficient, i.e. bootstrap themselves 
> The bootstrapping is done in the same way for different build and test tasks

Understood. We have talked about this before.  I have also been on the search 
for the Holy Grail. ;^)
This is why I keep working on JPRT.

> 
> The only scaling aspect that seems at all challenging is that the current 
> checkin system is designed to serialize checkins in a way that apparently 
> does not scale -- here there are some decisions to be made and tradeoffs but 
> this is nothing new in the world of Open community development (or any large 
> team development for that matter)

The serialize checkins issue can be minimized some by using distributed SCMs 
(Mercurial, Git, etc)
and using separate forests (fewer developers per source repository means fewer 
merge/sync issues)
and having an integrator merge into a master. This has proven to work in many 
situations but it
also creates delivery to master delays, especially if the integration process 
is too heavyweight.

The JDK projects has been doing this for a long time, I'm sure many people have 
opinions as to how
successful it is or isn't.

It is my opinion that merges/syncs are some of the most dangerous things you 
can do to a source base,
and anything we can do to avoid them is usually goodness, I don't think you 
should scale this without some
very great care.

> 
>> 
>> And that one system will naturally change over time too, so unless you are 
>> able to prevent all change
>> to a system (impossible with security updates etc) every use of that 'same 
>> system' will be different.
> 
> Yes, but it is possible to control this update and have a staging environment 
> so you know that a HW/OS update will not break the existing successful build 
> when rolled out to the build/test farm.

Possible but not always easy. The auto updating of everything has increased 
significantly over the years,
making it harder to control completely.

I've been doing this build&test stuff long enough to never expect anything to 
be 100% reliable.
Hardware fails, software updates regress functionality, networks become 
unreliable, humans trip over
power cords, virus scanners break things, etc. It just happens, and often, it's 
not very predictable or reproducible.
You can do lots of things to minimize issues, but at some point you just have 
to accept a few risks because
the alternative just isn't feasible or just can't happen with the resources we 
have.

-kto


Reply via email to