Dear Ryan, Conceptually I like the approach a lot.
*However*, this will generate an incomprehensible amount of emails when a new build slave gets added or when some revbumps are done etc. In particular from the 10.5/ppc machine. The individual emails will be a lot more helpful, but the amount of them ... This sounds totally hacky, but one way around that I can see is to write a special-purpose mailer. Either a job on the buildbot or a special external script that iterates through all the jobs from the last hour from all the builders and creates build failure summaries for each individual developer (author/committer/maintainer). Maybe this could be a build job on the builder master, but I don't know how tricky it would get to do it. How would you handle duplicate entries in the build list? Dependency order could also become semi-obsolete, but that's probably a price to be payed. I tried to play with buildNext, but I couldn't figure out how to read properties of the build request. Mojca On 11 March 2018 at 10:25, Ryan Schmidt wrote: > The current buildbot setup has a number of problems that I believe could be > solved by combining the currently separate portwatcher and portbuilder > schedulers into a single ports scheduler. > > I am not suggesting that we return to the behavior of the ports scheduler on > the old macOS forge buildbot system in which a single build would build all > the specified ports. We will keep the current method of building only one > port (and its dependencies) per build. > > The problems I want to solve are the following: > > 1. Currently, portwatcher is responsible for updating a copy of mpbb, > MacPorts base and a ports tree that it shares with portbuilder. Having > portbuilder maintain its own copy would waste a lot of time. If someone makes > one commit that changes 100 ports and then no further commits occur for > hours, we only want to update mpbb, MacPorts base and the ports tree once, > not 100 times. But the fact that it's shared means that portwatcher must (and > is configured to) wait for all triggered portbuilder builds to finish before > it processes the next commit. This works fine, unless the buildmaster is > stopped while portwatcher builds are pending. This has happened several times > when the servers lost power during a power outage. (The servers are on a UPS, > but the UPS does not provide as much instantaneous power as I expected, so if > the servers are busy building, they draw more power than the UPS can > instantaneously provide and the servers shut down immediately. I might remove > the buildworker machines from the UPS and leave only the buildmaster, modem > and router on it.) When buildmaster comes back online, it sees the > portbuilder build that was in progress and starts it again, but it also sees > the portwatcher build that was in progress and starts it again. Now we have a > portwatcher running (updating mpbb, updating MacPorts, updating the ports > tree, and updating its portindex) while portbuilder is trying to install a > port. The portbuilder can fail if it is trying to install ports at the moment > that portwatcher is updating the index (see > https://trac.macports.org/ticket/53587). > > 2. If a single portwatcher build "X" triggers many portbuilder builds, and > while those portbuilder builds are in progress another commit comes in that > would affect those ports, it don't notice until all portbuilder builds > triggered by "X" are finished. This can waste time building ports that are > already superseded by newer versions or revisions. An extreme example of this > is if we were to force a portwatcher build for all ports (which we might want > to do when a new version of macOS is released). mpbb, MacPorts base and the > ports tree would be updated once, and then it would schedule a portbuilder > for each port that had not yet been built. Building all ports will take > weeks. During that time, a commit may come through that updates a port to a > new version. But if the build of the old version of the port was still > pending at that time, the buildbot will build the old version, because it > can't update the ports tree until the current portwatcher build is done > waiting for its triggered portbuilder builds. > > 3. When there are portwatcher builds pending, we have no idea how many > portbuilder builds are pending. It may say there are e.g. 3 portbuilder > builds pending, but the pending portwatchers could trigger any number of > additional portbuilders. > > An objection to this proposal was that buildbot 0.8 does not have the > capability to dynamically create scheduler steps at runtime. But that's not > required and that's not what I'm proposing. > > Buildbot has the ability to call a function for each step to determine if > that step should run, by specifying the doStepIf property. I recently started > using this feature in portwatcher to skip the two trigger steps if there are > no ports in the port list: > > https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9 > > Here is an example of what that looks like when it runs: > > https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989 > > The only port that was committed there had already been built so it was > excluded from the port list, leaving the list empty, so the portbuilder > trigger step was skipped (to save time) and the mirror trigger step was > skipped (to prevent it from printing an error that no ports were specified). > The skipped steps are still shown in the web interface, but if that's not > desired, they can be hidden by also using the hideStepIf property. > > So my proposed combined ports scheduler would still contain all of the steps > of the current portwatcher and portbuilder schedulers, but each build would > still conceptually "be" either a portwatcher or a portbuilder, and for each > build, the steps that don't relate to that conceptual function would be > skipped and hidden. > > Buildbot has the ability to associate custom properties with a build. We use > this to set a "portname" property when we trigger portbuilder builds. > portwatcher builds don't have that property. (They have a "portlist" property > if they were forced from the web interface, and always have a "fullportlist" > property which is the portlist plus the computed list of ports that were > modified by the commit.) So if a build has the "portname" property, it is a > portbuilder, otherwise it is a portwatcher. We can then write some simple > functions: > > def is_portbuilder(step): > return step.hasProperty('portname') > > def is_portwatcher(step): > return not step.hasProperty('portname') > > def is_skipped(result, step): > return (result == buildbot.status.results.SKIPPED) > > The steps of the combined ports scheduler would then be: > > * update mpbb (doStepIf=is_portwatcher, hideStepIf=is_skipped) > * clean (doStepIf=is_portwatcher, hideStepIf=is_skipped) > * selfupdate MacPorts (doStepIf=is_portwatcher, hideStepIf=is_skipped) > * sync ports tree (doStepIf=is_portwatcher, hideStepIf=is_skipped) > * get the list of subports (doStepIf=is_portwatcher, hideStepIf=is_skipped) > * trigger mirror scheduler (doStepIf=is_portwatcher, hideStepIf=is_skipped, > waitForFinish=True) > * trigger ports scheduler for each subport (doStepIf=is_portwatcher, > hideStepIf=True, waitForFinish=False) > > * install dependencies (doStepIf=is_portbuilder, hideStepIf=is_skipped) > * install port (doStepIf=is_portbuilder, hideStepIf=is_skipped) > * gather archives (doStepIf=is_portbuilder, hideStepIf=is_skipped) > * upload archives to buildmaster (doStepIf=is_portbuilder, > hideStepIf=is_skipped) > * deploy archives (doStepIf=is_portbuilder, hideStepIf=is_skipped) > * clean (doStepIf=is_portbuilder, hideStepIf=is_skipped) > > (I'm advocating here for always hiding the ports trigger step. I don't find > its output in the waterfall to be very useful and it can take up a lot of > space if a lot of builds were triggered.) > > This should solve problem (1). By having portwatcher and portbuilder tasks in > the same queue, they can't run simultaneously so they can't cause the > problems that happen when they run simultaneously. > > Solving (2) and (3) requires an additional step. Buildbot allows you to > define a nextBuild property when you create the builder, and pass it a > function that determines which of the pending builds should go next. > > http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration > > We would write a function that looks through the pending builds in the order > in which they were scheduled, and picks the first one that is a portwatcher > (the first one that doesn't have a "portname" property). If none are > portwatchers, it picks the first* one. (* This is where we would later > improve the situation to pick the next port in the correct dependency order, > but I have another email about that.) > > This solves (2) because now when a new commit comes in, a new "portwatcher" > gets added to the end of the ports scheduler's queue, but it will be the next > build picked immediately after the current "portbuilder" finishes, no matter > how many other "portbuilders" are still pending. That will update the ports > tree, so any pending builds for ports that were subsequently updated will > build the now-current version, not the old version. > > It also solves (3) because now if 100 "portbuilder" builds are pending, we > don't have to wait until all of them are finished before we know how many > builds all the pending "portwatcher" builds will schedule; we only have to > wait until the current "portbuilder" finishes. > > One drawback, depending on how you look at it, is that we would no longer be > able to send a single combined email for all of the failed builds of a single > commit, or of a single forced build. We would have to send an individual > email for each failed build. Personally, I would prefer that, as the subject > line of the email would make clear which port failed to build, rather than > requiring me to open it to see what happened. > > If we give the new combined scheduler a new name like "ports", that will > invalidate all old links to build logs. That's not a deal-breaker for me, but > since we often paste build log URLs into tickets, it would be nice to keep > them alive if we can. Maybe defining empty portbuilder and portwatcher > schedulers would be enough. >
