Dear Ryan,

Conceptually I like the approach a lot.

*However*, this will generate an incomprehensible amount of emails
when a new build slave gets added or when some revbumps are done etc.
In particular from the 10.5/ppc machine. The individual emails will be
a lot more helpful, but the amount of them ...

This sounds totally hacky, but one way around that I can see is to
write a special-purpose mailer. Either a job on the buildbot or a
special external script that iterates through all the jobs from the
last hour from all the builders and creates build failure summaries
for each individual developer (author/committer/maintainer). Maybe
this could be a build job on the builder master, but I don't know how
tricky it would get to do it.

How would you handle duplicate entries in the build list? Dependency
order could also become semi-obsolete, but that's probably a price to
be payed.

I tried to play with buildNext, but I couldn't figure out how to read
properties of the build request.


On 11 March 2018 at 10:25, Ryan Schmidt wrote:
> The current buildbot setup has a number of problems that I believe could be 
> solved by combining the currently separate portwatcher and portbuilder 
> schedulers into a single ports scheduler.
> I am not suggesting that we return to the behavior of the ports scheduler on 
> the old macOS forge buildbot system in which a single build would build all 
> the specified ports. We will keep the current method of building only one 
> port (and its dependencies) per build.
> The problems I want to solve are the following:
> 1. Currently, portwatcher is responsible for updating a copy of mpbb, 
> MacPorts base and a ports tree that it shares with portbuilder. Having 
> portbuilder maintain its own copy would waste a lot of time. If someone makes 
> one commit that changes 100 ports and then no further commits occur for 
> hours, we only want to update mpbb, MacPorts base and the ports tree once, 
> not 100 times. But the fact that it's shared means that portwatcher must (and 
> is configured to) wait for all triggered portbuilder builds to finish before 
> it processes the next commit. This works fine, unless the buildmaster is 
> stopped while portwatcher builds are pending. This has happened several times 
> when the servers lost power during a power outage. (The servers are on a UPS, 
> but the UPS does not provide as much instantaneous power as I expected, so if 
> the servers are busy building, they draw more power than the UPS can 
> instantaneously provide and the servers shut down immediately. I might remove 
> the buildworker machines from the UPS and leave only the buildmaster, modem 
> and router on it.) When buildmaster comes back online, it sees the 
> portbuilder build that was in progress and starts it again, but it also sees 
> the portwatcher build that was in progress and starts it again. Now we have a 
> portwatcher running (updating mpbb, updating MacPorts, updating the ports 
> tree, and updating its portindex) while portbuilder is trying to install a 
> port. The portbuilder can fail if it is trying to install ports at the moment 
> that portwatcher is updating the index (see 
> 2. If a single portwatcher build "X" triggers many portbuilder builds, and 
> while those portbuilder builds are in progress another commit comes in that 
> would affect those ports, it don't notice until all portbuilder builds 
> triggered by "X" are finished. This can waste time building ports that are 
> already superseded by newer versions or revisions. An extreme example of this 
> is if we were to force a portwatcher build for all ports (which we might want 
> to do when a new version of macOS is released). mpbb, MacPorts base and the 
> ports tree would be updated once, and then it would schedule a portbuilder 
> for each port that had not yet been built. Building all ports will take 
> weeks. During that time, a commit may come through that updates a port to a 
> new version. But if the build of the old version of the port was still 
> pending at that time, the buildbot will build the old version, because it 
> can't update the ports tree until the current portwatcher build is done 
> waiting for its triggered portbuilder builds.
> 3. When there are portwatcher builds pending, we have no idea how many 
> portbuilder builds are pending. It may say there are e.g. 3 portbuilder 
> builds pending, but the pending portwatchers could trigger any number of 
> additional portbuilders.
> An objection to this proposal was that buildbot 0.8 does not have the 
> capability to dynamically create scheduler steps at runtime. But that's not 
> required and that's not what I'm proposing.
> Buildbot has the ability to call a function for each step to determine if 
> that step should run, by specifying the doStepIf property. I recently started 
> using this feature in portwatcher to skip the two trigger steps if there are 
> no ports in the port list:
> Here is an example of what that looks like when it runs:
> The only port that was committed there had already been built so it was 
> excluded from the port list, leaving the list empty, so the portbuilder 
> trigger step was skipped (to save time) and the mirror trigger step was 
> skipped (to prevent it from printing an error that no ports were specified). 
> The skipped steps are still shown in the web interface, but if that's not 
> desired, they can be hidden by also using the hideStepIf property.
> So my proposed combined ports scheduler would still contain all of the steps 
> of the current portwatcher and portbuilder schedulers, but each build would 
> still conceptually "be" either a portwatcher or a portbuilder, and for each 
> build, the steps that don't relate to that conceptual function would be 
> skipped and hidden.
> Buildbot has the ability to associate custom properties with a build. We use 
> this to set a "portname" property when we trigger portbuilder builds. 
> portwatcher builds don't have that property. (They have a "portlist" property 
> if they were forced from the web interface, and always have a "fullportlist" 
> property which is the portlist plus the computed list of ports that were 
> modified by the commit.) So if a build has the "portname" property, it is a 
> portbuilder, otherwise it is a portwatcher. We can then write some simple 
> functions:
>     def is_portbuilder(step):
>         return step.hasProperty('portname')
>     def is_portwatcher(step):
>         return not step.hasProperty('portname')
>     def is_skipped(result, step):
>         return (result == buildbot.status.results.SKIPPED)
> The steps of the combined ports scheduler would then be:
> * update mpbb (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * clean (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * selfupdate MacPorts (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * sync ports tree (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * get the list of subports (doStepIf=is_portwatcher, hideStepIf=is_skipped)
> * trigger mirror scheduler (doStepIf=is_portwatcher, hideStepIf=is_skipped, 
> waitForFinish=True)
> * trigger ports scheduler for each subport (doStepIf=is_portwatcher, 
> hideStepIf=True, waitForFinish=False)
> * install dependencies (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * install port (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * gather archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * upload archives to buildmaster (doStepIf=is_portbuilder, 
> hideStepIf=is_skipped)
> * deploy archives (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> * clean (doStepIf=is_portbuilder, hideStepIf=is_skipped)
> (I'm advocating here for always hiding the ports trigger step. I don't find 
> its output in the waterfall to be very useful and it can take up a lot of 
> space if a lot of builds were triggered.)
> This should solve problem (1). By having portwatcher and portbuilder tasks in 
> the same queue, they can't run simultaneously so they can't cause the 
> problems that happen when they run simultaneously.
> Solving (2) and (3) requires an additional step. Buildbot allows you to 
> define a nextBuild property when you create the builder, and pass it a 
> function that determines which of the pending builds should go next.
> We would write a function that looks through the pending builds in the order 
> in which they were scheduled, and picks the first one that is a portwatcher 
> (the first one that doesn't have a "portname" property). If none are 
> portwatchers, it picks the first* one. (* This is where we would later 
> improve the situation to pick the next port in the correct dependency order, 
> but I have another email about that.)
> This solves (2) because now when a new commit comes in, a new "portwatcher" 
> gets added to the end of the ports scheduler's queue, but it will be the next 
> build picked immediately after the current "portbuilder" finishes, no matter 
> how many other "portbuilders" are still pending. That will update the ports 
> tree, so any pending builds for ports that were subsequently updated will 
> build the now-current version, not the old version.
> It also solves (3) because now if 100 "portbuilder" builds are pending, we 
> don't have to wait until all of them are finished before we know how many 
> builds all the pending "portwatcher" builds will schedule; we only have to 
> wait until the current "portbuilder" finishes.
> One drawback, depending on how you look at it, is that we would no longer be 
> able to send a single combined email for all of the failed builds of a single 
> commit, or of a single forced build. We would have to send an individual 
> email for each failed build. Personally, I would prefer that, as the subject 
> line of the email would make clear which port failed to build, rather than 
> requiring me to open it to see what happened.
> If we give the new combined scheduler a new name like "ports", that will 
> invalidate all old links to build logs. That's not a deal-breaker for me, but 
> since we often paste build log URLs into tickets, it would be nice to keep 
> them alive if we can. Maybe defining empty portbuilder and portwatcher 
> schedulers would be enough.

Reply via email to