Re: [tools-dev] Specification and design of the Sun OpenOffice build cluster

Jens-Heiner Rechtien Mon, 30 Oct 2006 05:25:36 -0800

Hi Kai,

Kai Backman wrote:

Hi everyone,


We are working on designing a build cluster for the OpenOffice
BuildBot. Our goal is to have a farm of machines that community
contributors can use to quickly do (distributed) builds of OpenOffice.

I've heard that Sun Release Engineering has a build cluster in use.
We would love to know more about the cluster to help us design the
BuildBot one.

So here is a bunch of questions:
- How many machines are there in the cluster?

About 16 machines, from 2 to 8 processors, plus a number of specialpurpose machines (for example old machines for building very old codelines) and monitoring machines, test clients etc.

- What hardware/OS are they running?


Fileserver: Sun-Fire 890 with 4 double core 1500 MHz Ultra-Sparc IV
            processors, 16 GBytes RAM, Solaris 10

Storage:    2 x Sun 3511 Storages with a total raw Capacity
            of 10,8 GBytes plus some older Storages

Build clients (nodes):
Solaris Sparc: Sun-Fire 880 with 8 x 900 MHz Sparc processors,
               16 GBytes RAM,
               Solaris 8
Solaris Intel: 2 Sun v60x (2 x 3.06 GHz Xeons)
               Solaris 9
Linux:         2 Sun v60x (2 x 3.06 GHz Xeons),
               2 double processor machine (2 x 2,8 GHz Xeons)
               SuSE 7.3
Windows:       6 Sun v60x (2 x 3.06 GHz Xeons),
               2 Sun v20z (2 x 1.6 GHz Opterons),
               Windows XP

We build product and non-products build for most milestones and mostplatforms. The OS reflects the base line for our builds, thus we need touse old versions of the OS to guarantee a broad set of suitable targetplatforms. The high number of Windows clients reflects the high pain ofdoing Windows builds :-).

- How does the network infrastucture work? What is the design and capacity?

Mixed Gigabit/100 MBit network, nothing special. The build clients andthe file server are just a normal part of our network.

- How is the shared disk space handled? What type of server/software
are you using?

See above ... we use currently Sun QFS and plan to migrate to ZFS(included in Solaris 10 update 2). The shares are exposed via NFS4 andSamba. We prefer to use NFS on Windows clients, too, because NFS yieldsa better performance than Samba for our kind of load.

- How do you monitor the cluster? What loads (disk, CPU, network) are
you measuring? How do you measure them?

With our custom distribution software and standard tools. The load onthe fileserver is low, the Sun-Fire 890 has ample power.

- What is the bottleneck? Is the cluster CPU, disk, RAM or network bound?

Building on Solaris and Linux is CPU bound (and quite fast), building onWindows is network bound and relative slow. Tasks, like copying backbuild milestones are disk bound, obviously.

- How does the task distribution software work?

Every build machine hosts 4-16 (depending on number of processors, RAMetc) so called "build clients", that's a kind of daemon. This daemonaccepts a job from the "build master". Each job consists of building adirectory by spawning dmake and returning the results. The "buildmaster" maintains the queues, determines which directories can be buildnow, distributes the job, and accepts the results. The current build canbe viewed and controlled via a nice GUI from either the "build master"or a "build slave". "Build slaves" are subordinated copies of the "buildmaster", so that several release engineers can have a look on the buildconcurrently.

- How does the cluster handle nodes dying during a build?

The job will be redistributed to another client if no response arrivesin a certain time frame-

- How many nodes can the build be paralellized on? 50? 100?

A typical number of nodes is 68 for 7 platforms (4 product builds and 3non-product builds) which are build concurrently. The system is able toaccommodate more if more build machines are added.

- What is the utilization of the cluster? Ie. How much paralellism are
you able to extract from the build? 75% 80%?

In the beginning of a build the parallelism is kinda restricted becauseprerequisites need to be build. Later the parallelism is pretty good,most of the time are all clients (nodes) are doing something. Whencreating package set (a significant part of the total build time) theparallelism is perfect.

How does the paralellism
scale, what is the optimum number of machines for a build.

Hard to tell, the above mentioned number of clients is not yet enough to"saturate" the system. Please note that we do 7 different builds inparallel and of course Solaris/Linux/Windows product and non-productbuilds can share their clients, meaning if there is nothing in the queuefor a Linux product-build the client will happily build Linuxnon-product builds.


Is there anything I'm not asking about that I should?

Thanks for the answers and helping out with this!


Hope this helps,
  Heiner

--
Jens-Heiner Rechtien
[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [tools-dev] Specification and design of the Sun OpenOffice build cluster

Reply via email to