The Antora build part of the website is getting better at detecting problems and failing the build, and the website build seems to me to be failing more often. Perhaps we can find ways to improve our process so there are fewer problematic commits and it’s easier to detect and fix problems earlier.
There are a few problems caused by interactions between near-in-time commits and commits that bring in stuff that is obsolete due to recent website build changes. Let’s ignore those :-)… especially the second kind will iron themselves out over time. So, people keep merging PRs that change the documentation without checking that it doesn’t break the website build, either locally or as a CI check on the PR. They theoretically could do a local website build that incorporates their changes, but right now it’s way too hard and time consuming. (I’ll discuss the problems with the projects that attempt to do a partial local build later) So one good step would be to make local website builds to check doc changes easy and quick. I’ve made some progress on this. Another step would be for CI to check the website build on each PR, either the whole site or a partial build. I think GH actions can trigger each other, but I’ve never set it up. Do we have enough GH action time to do a full website build on every PR to any camel subproject? Is it practical to trigger the website build only when something documentation-related changes? (this detection would need to be carefully set up in each subproject) If these are possible I think we should just do this. It’s probably possible to set up quicker partial builds, but it’s decidedly more complicated. Another step would be to make it extremely visible when the jenkins website build fails. I try to follow the dev list pretty closely, and see a lot of GH PR CI build failures reported, but apparently the jenkins build has been failing for several days and I had no idea. In principle, what other steps could we take? —— Comments on the existing attempts to have subproject-specific partial builds: Dan Allen (of Antora) has repeatedly said that subsidiary builds such as local or partial builds should be done from (clones) of the repo containing the playbook for the actual site. For a long time I disagreed and thought approaches like that of camel-quarkus to have a local build in the subproject were workable but I’m now convinced that they are totally unmaintainable. They rely on updating each such subproject every time the main playbook changes, and in a way that requires deep understanding of the entire site build. It just isn’t going to work, ever. —— Maybe there’s hope… If we’re going to encourage or require local builds of the website, there needs to be a defined file system relationship between the camel-website clone and the subproject(s) clone(s). I have a “global” directory (named camel) into which I’ve cloned all the subprojects next to one another (together with some extra git work trees). I think this is the simplest arrangement and I think we could require it. Next, there needs to be an easy way (preferably automated) to modify the playbook to take account of building against one (or possibly more) local clones. E.g, if I’m working on camel-quarkus, I should only need to have camel-quarkus cloned, and still be able to do a build. Doing this is much more plausible if we can assume that every branch participating in the website is present and up to date locally. Does anyone know if it’s possible to write a git script that can update branches without switching to them? If we can assume this, then the local build just involves changing the playbook source url from GitHub….<project>.git to `./../<project>` and adjusting the checked out branch name. Then there’s the problem that the full Antora build takes something like 6 minutes now, which is too long for anyone to wait for. So, we need an effective way of doing quick partial builds. I’ve been working on this with some progress. Dan has an idea he calls a site manifest, which means that the site build writes out the content catalog with information about the Antora coordinates and the site location of every page. Then a partial build can read this in to populate the partial build content catalog, so that xrefs can be properly resolved. This was originally developed to enable a “subsidiary site” to have xrefs to a “main site”. I’ve adapted this to be an Antora pipeline extension, and it can be used in a couple of ways. - A site manifest could be published as part of the actual site. In this case the partial build would fetch it, and only pages actually present locally would get local links. You’d find out whether there are any problems, but it might be hard to locate the local pages through navigation. - If you do a full build locally to generate a local site manifest, a partial build using that site manifest will only overwrite the rebuilt local files, leaving you with a functional local site. - Possibly the full Jenkins build could also package the Antora site as a zip archive, and local builds could fetch and unpack it rather than doing a full local build. With the site manifest, there’s still the problem of modifying the playbook to only build a little bit. I’ve written another extension that you configure with the part you want to build, and it applies appropriate filters. You can configure it down to one page. It also watches for changes and rebuilds when it detects a change: I think I’ll need to make that configurable since it’s great to see your changes quickly but not what you want for a build step. I have not yet tried to make it easy to select which subproject you want to build: so far it requires knowing how to configure the extensions. I’ve started having some ideas on how this might be done. What I’m envisioning and hoping for is a pre-PR process that involves running, in a local camel-website clone, something like `yarn partial-build-camel-quarkus` that will in less than a minute detect any errors and produce a local site you can look at with the local changes. Thoughts? David Jencks