Hi all,

I thought a bit about our build bots and how to deal with them. It takes us a long time to get the bots fixing once they broke. And currently they broke because somethings seem to have changed on the machines.

I see to issues. We do not monitor / fix the bots and they are changing without our notice. (that is not ment as a critic, just an observation). So i think that we should adress this, since we are not enough people that the process works as is.

I am think about the following measures:

1) I would like to get information on when a bot is failing, so we can take action, and do not need to monitor.

 1a) I know from Gavin that we could activate an email. I would like to get the information in future about the fail. What would be the right channel for such an email?

- dev : imho could be to many noise for overall

- sysadmin: an option, but i am not super sure this is the right place.

- an own list for tech messages.

2) the system seem to fail often because something changed underneeth and our old build environment brakes.

a) I would like to have some build environment health check, to check if the system is working for the build and what is wrong. we could extent the autoconfig with more tests. Would that be the right way?

b) I would like to have more control over how the buildbots are configured. Maybe have it better documented in code for puppet or another system. Maybe even setup the build system at each build.

C) as an alternative, i could emagine that we provide images like our linux image, and adapt it to the use by the build bot. This can be created for windows too. and maybe even possible for mac. it would make things easier for us.

This is a complete different strategy. I am not sure if infra would support us in this dockerization.

3) Timeline

I am very slow in working down my tasks. I want to look into the bots and I am volunteering, but i need to look at what is still open. so my situation is as follows:
1) Finishing opengrok (there is mostly documentation and infra task open.)

2) pootle migration + get the translation CI / CD going

    -  this involves the linux build machine

3) Extension Page update / rewrite

- i still need to research the best strategy. That is the next step

4) Python 3 update

5) there is a security topic it seems i am the only one currently looking into it.

6) MediaWiki dockerization and update on the latest version.

We are on a newer version, but still on an out of date version. We should try to dockerize it imho. (the db remains local build, the effort is then not that big.)

7) Fixing the Windows bots

That is currently my priorization. And since i placed the Windows bot down to 7. It may make sense if no one else looks into it that we delete the bots for now as Matthias has suggested. I can then implement the strategy for the linux bot, and we learn there more if the situation improves.

Of course if there is a Volunteer for any of the topics I am willing to support the volunteer in getting it done.


Please, provide me with your insights or questions. It will create more clarity.

Thanks, all the best

peter

Am 07.12.2025 um 16:57 schrieb Gavin McDonald:
https://ci2.apache.org/#/workers/13

Does anyone want to help fix these or shall we delete the jobs?




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to