What can we do to make the documentation and website build more resilient?

David Jencks Sat, 13 Nov 2021 10:47:11 -0800

The Antora build part of the website is getting better at detecting problems 
and failing the build, and the website build seems to me to be failing more 
often.  Perhaps we can find ways to improve our process so there are fewer 
problematic commits and it’s easier to detect and fix problems earlier.


There are a few problems caused by interactions between near-in-time commits 
and commits that bring in stuff that is obsolete due to recent website build 
changes.  Let’s ignore those :-)… especially the second kind will iron 
themselves out over time.

So, people keep merging PRs that change the documentation without checking that 
it doesn’t break the website build, either locally or as a CI check on the PR.

They theoretically could do a local website build that incorporates their 
changes, but right now it’s way too hard and time consuming. (I’ll discuss the 
problems with the projects that attempt to do a partial local build later)
So one good step would be to make local website builds to check doc changes 
easy and quick.  I’ve made some progress on this.

Another step would be for CI to check the website build on each PR, either the 
whole site or a partial build.  I think GH actions can trigger each other, but 
I’ve never set it up.  Do we have enough GH action time to do a full website 
build on every PR to any camel subproject?  Is it practical to trigger the 
website build only when something documentation-related changes? (this 
detection would need to be carefully set up in each subproject)  If these are 
possible I think we should just do this.  It’s probably possible to set up 
quicker partial builds, but it’s decidedly more complicated.

Another step would be to make it extremely visible when the jenkins website 
build fails.  I try to follow the dev list pretty closely, and see a lot of GH 
PR CI build failures reported, but apparently the jenkins build has been 
failing for several days and I had no idea.

In principle, what other steps could we take?

——

Comments on the existing attempts to have subproject-specific partial builds:

Dan Allen (of Antora) has repeatedly said that subsidiary builds such as local 
or partial builds should be done from (clones) of the repo containing the 
playbook for the actual site.  For a long time I disagreed and thought 
approaches like that of camel-quarkus to have a local build in the subproject 
were workable but I’m now convinced that they  are totally unmaintainable.  
They rely on updating each such subproject every time the main playbook 
changes, and in a way that requires deep understanding of the entire site 
build.  It just isn’t going to work, ever.


——
Maybe there’s hope…

If we’re going to encourage or require local builds of the website, there needs 
to be a defined file system relationship between  the camel-website clone and 
the subproject(s) clone(s).  I have a “global” directory (named camel) into 
which I’ve cloned all the subprojects next to one another (together with some 
extra git work trees).  I think this is the simplest arrangement and I think we 
could require it.

Next, there needs to be an easy way (preferably automated) to modify the 
playbook to take account of building against one (or possibly more) local 
clones.  E.g, if I’m working on camel-quarkus, I should only need to have 
camel-quarkus cloned, and still be able to do a build. Doing this is much more 
plausible if we can assume that every branch participating in the website is 
present and up to date locally.  Does anyone know if it’s possible to write a 
git script that can update branches without switching to them?  If we can 
assume this, then the local build just involves changing the playbook source 
url from GitHub….<project>.git to `./../<project>` and adjusting the checked 
out branch name.

Then there’s the problem that the full Antora build takes something like 6 
minutes now, which is too long for anyone to wait for. So, we need an effective 
way of doing quick partial builds.  I’ve been working on this with some 
progress.  Dan has an idea he calls a site manifest, which means that the site 
build writes out the content catalog with information about the Antora 
coordinates and the site location of every page. Then a partial build can read 
this in to populate the partial build content catalog, so that xrefs can be 
properly resolved. This was originally developed to enable a “subsidiary site” 
to have xrefs to a “main site”.  I’ve adapted this to be an Antora pipeline 
extension, and it can be used in a couple of ways.

- A site manifest could be published as part of the actual site.  In this case 
the partial build would fetch it, and only pages actually present locally would 
get local links.  You’d find out whether there are any problems, but it might 
be hard to locate the local pages through navigation.

- If you do a full build locally to generate a local site manifest, a partial 
build using that site manifest will only overwrite the rebuilt local files, 
leaving you with a functional local site.

- Possibly the full Jenkins build could also package the Antora site as a zip 
archive, and local builds could fetch and unpack it rather than doing a full 
local build.

With  the site manifest, there’s still the problem of modifying the playbook to 
only build a little bit.  I’ve written another extension that you configure 
with  the part you want to build, and it applies appropriate filters. You can 
configure it down to one page.  It also watches for changes and rebuilds when 
it detects a change: I think I’ll need to make that configurable since it’s 
great to see your changes quickly but not what you want for a build step.
I have not yet tried to make it easy to select which subproject you want to 
build: so far it requires knowing how to configure the extensions. I’ve started 
having some ideas on how this might be done. 

What I’m envisioning and hoping for is a pre-PR process that involves running, 
in a local  camel-website clone, something like `yarn 
partial-build-camel-quarkus` that will in less than a minute detect any errors 
and produce a local site you can look at with the local  changes.

Thoughts?

David Jencks

What can we do to make the documentation and website build more resilient?

Reply via email to