Buildbot proposal: combine portwatcher and portbuilder

Ryan Schmidt Sun, 11 Mar 2018 01:26:49 -0800

The current buildbot setup has a number of problems that I believe could be 
solved by combining the currently separate portwatcher and portbuilder 
schedulers into a single ports scheduler.


I am not suggesting that we return to the behavior of the ports scheduler on 
the old macOS forge buildbot system in which a single build would build all the 
specified ports. We will keep the current method of building only one port (and 
its dependencies) per build.

The problems I want to solve are the following:

1. Currently, portwatcher is responsible for updating a copy of mpbb, MacPorts 
base and a ports tree that it shares with portbuilder. Having portbuilder 
maintain its own copy would waste a lot of time. If someone makes one commit 
that changes 100 ports and then no further commits occur for hours, we only 
want to update mpbb, MacPorts base and the ports tree once, not 100 times. But 
the fact that it's shared means that portwatcher must (and is configured to) 
wait for all triggered portbuilder builds to finish before it processes the 
next commit. This works fine, unless the buildmaster is stopped while 
portwatcher builds are pending. This has happened several times when the 
servers lost power during a power outage. (The servers are on a UPS, but the 
UPS does not provide as much instantaneous power as I expected, so if the 
servers are busy building, they draw more power than the UPS can 
instantaneously provide and the servers shut down immediately. I might remove 
the buildworker machines from the UPS and leave only the buildmaster, modem and 
router on it.) When buildmaster comes back online, it sees the portbuilder 
build that was in progress and starts it again, but it also sees the 
portwatcher build that was in progress and starts it again. Now we have a 
portwatcher running (updating mpbb, updating MacPorts, updating the ports tree, 
and updating its portindex) while portbuilder is trying to install a port. The 
portbuilder can fail if it is trying to install ports at the moment that 
portwatcher is updating the index (see https://trac.macports.org/ticket/53587).

2. If a single portwatcher build "X" triggers many portbuilder builds, and 
while those portbuilder builds are in progress another commit comes in that 
would affect those ports, it don't notice until all portbuilder builds 
triggered by "X" are finished. This can waste time building ports that are 
already superseded by newer versions or revisions. An extreme example of this 
is if we were to force a portwatcher build for all ports (which we might want 
to do when a new version of macOS is released). mpbb, MacPorts base and the 
ports tree would be updated once, and then it would schedule a portbuilder for 
each port that had not yet been built. Building all ports will take weeks. 
During that time, a commit may come through that updates a port to a new 
version. But if the build of the old version of the port was still pending at 
that time, the buildbot will build the old version, because it can't update the 
ports tree until the current portwatcher build is done waiting for its 
triggered portbuilder builds.

3. When there are portwatcher builds pending, we have no idea how many 
portbuilder builds are pending. It may say there are e.g. 3 portbuilder builds 
pending, but the pending portwatchers could trigger any number of additional 
portbuilders.

An objection to this proposal was that buildbot 0.8 does not have the 
capability to dynamically create scheduler steps at runtime. But that's not 
required and that's not what I'm proposing.

Buildbot has the ability to call a function for each step to determine if that 
step should run, by specifying the doStepIf property. I recently started using 
this feature in portwatcher to skip the two trigger steps if there are no ports 
in the port list:

https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9

Here is an example of what that looks like when it runs:

https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989

The only port that was committed there had already been built so it was 
excluded from the port list, leaving the list empty, so the portbuilder trigger 
step was skipped (to save time) and the mirror trigger step was skipped (to 
prevent it from printing an error that no ports were specified). The skipped 
steps are still shown in the web interface, but if that's not desired, they can 
be hidden by also using the hideStepIf property.

So my proposed combined ports scheduler would still contain all of the steps of 
the current portwatcher and portbuilder schedulers, but each build would still 
conceptually "be" either a portwatcher or a portbuilder, and for each build, 
the steps that don't relate to that conceptual function would be skipped and 
hidden.

Buildbot has the ability to associate custom properties with a build. We use 
this to set a "portname" property when we trigger portbuilder builds. 
portwatcher builds don't have that property. (They have a "portlist" property 
if they were forced from the web interface, and always have a "fullportlist" 
property which is the portlist plus the computed list of ports that were 
modified by the commit.) So if a build has the "portname" property, it is a 
portbuilder, otherwise it is a portwatcher. We can then write some simple 
functions:

    def is_portbuilder(step):
        return step.hasProperty('portname')

    def is_portwatcher(step):
        return not step.hasProperty('portname')

    def is_skipped(result, step):
        return (result == buildbot.status.results.SKIPPED)

The steps of the combined ports scheduler would then be:

* update mpbb (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* clean (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* selfupdate MacPorts (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* sync ports tree (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* get the list of subports (doStepIf=is_portwatcher, hideStepIf=is_skipped)
* trigger mirror scheduler (doStepIf=is_portwatcher, hideStepIf=is_skipped, 
waitForFinish=True)
* trigger ports scheduler for each subport (doStepIf=is_portwatcher, 
hideStepIf=True, waitForFinish=False)

* install dependencies (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* install port (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* gather archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* upload archives to buildmaster (doStepIf=is_portbuilder, 
hideStepIf=is_skipped)
* deploy archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
* clean (doStepIf=is_portbuilder, hideStepIf=is_skipped)

(I'm advocating here for always hiding the ports trigger step. I don't find its 
output in the waterfall to be very useful and it can take up a lot of space if 
a lot of builds were triggered.)

This should solve problem (1). By having portwatcher and portbuilder tasks in 
the same queue, they can't run simultaneously so they can't cause the problems 
that happen when they run simultaneously.

Solving (2) and (3) requires an additional step. Buildbot allows you to define 
a nextBuild property when you create the builder, and pass it a function that 
determines which of the pending builds should go next.

http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration

We would write a function that looks through the pending builds in the order in 
which they were scheduled, and picks the first one that is a portwatcher (the 
first one that doesn't have a "portname" property). If none are portwatchers, 
it picks the first* one. (* This is where we would later improve the situation 
to pick the next port in the correct dependency order, but I have another email 
about that.)

This solves (2) because now when a new commit comes in, a new "portwatcher" 
gets added to the end of the ports scheduler's queue, but it will be the next 
build picked immediately after the current "portbuilder" finishes, no matter 
how many other "portbuilders" are still pending. That will update the ports 
tree, so any pending builds for ports that were subsequently updated will build 
the now-current version, not the old version.

It also solves (3) because now if 100 "portbuilder" builds are pending, we 
don't have to wait until all of them are finished before we know how many 
builds all the pending "portwatcher" builds will schedule; we only have to wait 
until the current "portbuilder" finishes.

One drawback, depending on how you look at it, is that we would no longer be 
able to send a single combined email for all of the failed builds of a single 
commit, or of a single forced build. We would have to send an individual email 
for each failed build. Personally, I would prefer that, as the subject line of 
the email would make clear which port failed to build, rather than requiring me 
to open it to see what happened.

If we give the new combined scheduler a new name like "ports", that will 
invalidate all old links to build logs. That's not a deal-breaker for me, but 
since we often paste build log URLs into tickets, it would be nice to keep them 
alive if we can. Maybe defining empty portbuilder and portwatcher schedulers 
would be enough.

Buildbot proposal: combine portwatcher and portbuilder

Reply via email to