Dmitry,I think this discussion diverged somewhat from the original topic, but I do agree with you that we must also attack the problem on a process level.
With the model you propose (and also the existing model) I would also like to stress the need for continuous and automatic builds triggered by incoming new changes compared to the last working change. Having that, it is possible to update labels (tags) for "last_clean_build", "last_nightly_build", etc. That way any build breakage would only be visible at the tip.
However, when submitting a new change care should be taken to do it against a working tip, that it builds and tests correctly (personal check-in testing). Actually, this is close to the model we had for the JRockit source base. WLS also uses the model of sliding labels for "last_clean_build" where developers most often only do partial builds themselves.
Regarding external committers, I think they need both the option of building and testing locally, as well as access to an Open JDK specific queue for build and test submissions.
By making use of cross compilation and open tools it is possible to at least verify that the product builds locally. Even better is to also supply preconfigured VMs with the necessary standard build and test environment (e.g. obsolete Solaris or Linux distros that we require). In general, having a build and test setup that is automatically configurable (including Windows) will help both internal and external developers (see also the build-infra project).
/Robert On 01/30/2012 10:09 AM, Dmitry Samersoff wrote:
John,Actually the goal of my letter is not to promote new integration scheme. Just to remind that we need to put some efforts to internal process review and optimization.But, see answers below (inline): Integration method I mentioned often used in open source projects,because it doesn't require any special infrastructure for external commiters. The only necessary thing to do safe commit is a write access to integration (-gate) workspace.On 2012-01-30 06:35, John Coomes wrote:We have chosen a model: build->test->integrate but we may consider different approach: integrate->build->test->[backout if necessary]In that model, you can never rely on the repository having any degree of stability. It may not even build at a given moment.What happens today if Developer A and Developer B changes the same line of a source?What happens today if Developer A changes some_func() but Developer B rely on some_func() ?We would get a fault *after* all integration tests and SQE file one more nightly bug. To the time someone investigate it and give the fix, bad code will be distributed to all dev workspaces.Developer (A) integrate his changeset to an integration workspace Bot takes snapshot and start building/testing Developer (B) integrate his changeset to an integration workspace Bot takes snapshot and start building/testing if Job A failed, bot lock integration ws, restore it to pre-A state, apply B-patch. unlock ws.Don't forget the trusting souls that pulled from the integration repo after A inflicted the breakage: they each waste time cleaning up a copy of A's mess.Nobody pulls from -gate repository today and nobody expected to do it. -gate to ws merge continues as usual.To remove faulty changeset we need about fifteen minutes for whole jdk at worst.-Dmitry-JohnOn 2012-01-29 23:52, Kelly O'Hair wrote:On Jan 29, 2012, at 10:23 AM, Georges Saab wrote:I'm missing something. How can everybody using the exact same system scale to 100's of developers?System = distributed build and test of OpenJDKAh ha... I'm down in the trenches dealing with dozens of different OS's arch's variation machines.You are speaking to a higher level, I need to crawl out of the basement.Developers send in jobs Jobs are distribute across a pool of (HW/OS) resources The resources may be divided into pools dedicated to different tasks (RE/checkin/perf/stress)The pools are populated initially according to predictions of load andthen increased/rebalanced according to data on actual usage No assumptions made about what exists on the machine other than HW/OSThe build and test tasks are self sufficient, i.e. bootstrap themselves The bootstrapping is done in the same way for different build and testtasksUnderstood. We have talked about this before. I have also been on the search for the Holy Grail. ;^) This is why I keep working on JPRT.The only scaling aspect that seems at all challenging is that thecurrent checkin system is designed to serialize checkins in a way thatapparently does not scale -- here there are some decisions to be made and tradeoffs but this is nothing new in the world of Open community development (or any large team development for that matter)The serialize checkins issue can be minimized some by using distributedSCMs (Mercurial, Git, etc)and using separate forests (fewer developers per source repository meansfewer merge/sync issues)and having an integrator merge into a master. This has proven to work inmany situations but it also creates delivery to master delays, especially if the integration process is too heavyweight. The JDK projects has been doing this for a long time, I'm sure many people have opinions as to how successful it is or isn't.It is my opinion that merges/syncs are some of the most dangerous thingsyou can do to a source base,and anything we can do to avoid them is usually goodness, I don't thinkyou should scale this without some very great care.And that one system will naturally change over time too, so unless you are able to prevent all change to a system (impossible with security updates etc) every use of that 'same system' will be different.Yes, but it is possible to control this update and have a staging environment so you know that a HW/OS update will not break the existing successful build when rolled out to the build/test farm.Possible but not always easy. The auto updating of everything has increased significantly over the years, making it harder to control completely. I've been doing this build&test stuff long enough to never expect anything to be 100% reliable.Hardware fails, software updates regress functionality, networks becomeunreliable, humans trip over power cords, virus scanners break things, etc. It just happens, and often, it's not very predictable or reproducible.You can do lots of things to minimize issues, but at some point you justhave to accept a few risks because the alternative just isn't feasible or just can't happen with the resources we have. -kto-- Dmitry Samersoff Java Hotspot development team, SPB04 * There will come soft rains ...
-- Oracle Robert Ottenhag | Senior Member of Technical Staff Phone: +46850630961 | Fax: +46850630911 | Mobile: +46707106161 Oracle Java HotSpot Virtual Machine ORACLE Sweden | Folkungagatan 122 | SE-116 30 Stockholm Oracle Svenska AB, Kronborgsgränd 17, S-164 28 KISTA, reg.no. 556254-6746 Green Oracle Oracle is committed to developing practices and products that help protect the environment --