Dear all,

As you are likely aware, we will be migrating to GitHub, form Gerrit, soon. 
Some have reached out to me with questions about the testing infrastructure on 
GitHub and how it may be used and improved. I shall do so in this email.

GitHub allows for testing via its “Actions” infrastructure. GitHub Actions 
consists of “Workflows” specifies a set of jobs to be run that is executed on a 
specific event. 
GitHub Actions supports a wide range of these events from pull request 
creation, through to specific keywords appearing comments, simple scheduled 
runs and many more. The jobs the workflow specify run commands in a series 
steps and it is highly flexible for a large number of automation tasks. Some 
repos automatically moderate discussion threads, and others use it to build and 
deploy their service. Right now we are just using them to run tests. GitHub 
looks for GitHub Actions specified in a yaml format in “./github/workflows". 
Our current workload yaml file can be found here: 
https://github.com/gem5/gem5/blob/ad0a2d1beaa043c03c0e43406078b3a09a3861ac/.github/workflows/.
 There are many resources online explaining how to create yaml files to 
specific jobs and triggered on specific events so I won’t go into further 
details than this high-level description.

There is one small limitation with GitHub Actions which we will need to change 
procedures for. GitHub only reads the yaml files on the repository’s main 
branch. This means if we want to update which tests are run, or how they are 
run, we need to update the stable branch. After some discussion we believe the 
best policy will be to permit patches to be submitted to the stable branch 
between releases for  changes to these yaml files. Since these files do not 
affect the compilation or running of gem5, the stable branch is still “stable” 
with respect to the end user's interaction with gem5.

Jobs run on “runners". A runner is just a server which accepts GitHub jobs to 
run. They run one job at one time. Typically you would pay GitHub to use their 
runners as most actions complete in a matter of seconds so incur little cost. 
That won’t work for us as some of our tests take days to complete. Fortunately 
GitHub allows for “self-hosted runners”. With tooling provided by GitHub you 
can setup a runner on any machine you want and point it towards the git 
repository it is to accept job requests from. There is one big problem with 
this: A self-hosted runner is not secure. With the right job specification you 
can execute whatever you want on the host hardware. A smaller annoyance is 
GitHub makes it hard, but not impossible, to run more than one runner per 
machine, which is annoying when ideally you want several runners to be 
executing jobs in parallel on machines that can handle them.

Our solution to this is runners setup in virtual machines. We attempted to 
utilize Kubernetes for this for us but found it’s more tailored towards large 
cloud-based clusters where as we want to utilize a smaller number of servers at 
our disposal. After some trial and error we decided it wasn’t the right tool 
for the job. Moving on from this we opted to use Vagrant to create VMs to host 
the runners. I have documented all the scripts I used to do this here: 
https://gem5-review.googlesource.com/c/public/gem5/+/71098. You can consult the 
“README.md” on procedures to setup your own runners. Though I have created some 
scripts to semi-automated the process, it’s still quite manual. It would be 
nice if there was a more “push button” way to do deploy runners. In a similar 
vain, if they break we have to manually go in and restart them. There’s room 
for improvement here.

Right now we have two types of VM’s: “builders” and “runners”. Builders are 
4-core 16GB VMs with their primarily purpose being to build gem5. Runners are 
single-core 6GB VMs with their purpose being to run instances of gem5. Aiming 
for a rough 6 to 1 ratio we have  26 runners and 4 builders spread over 3 
machines though this is very lopsided as 1 of our machines hosts 20 runners. In 
the yaml file the jobs are distributed to either a runner or builder based on 
the “run-on” field.

Though this setup is currently functional, it does have some restrictions and 
pain-points. Of note:

- We do not have a runner which can run KVM tests. For the meantime these are 
skipped. We’re not sure how feasible putting a runner  in a VM which will allow 
KVM is.
- Due to the Weekly GPU tests needing a special docker container built in the 
tests, we need more time to figure out how to do this. At present we get errors 
but are working finding a solution.
- We do not have good tools to orchestrate these VMs. If they go down and they 
need restarted, or new VMs need created, it requires manual effort.
- 20 of our runners are on a single machine. It’d be much better to have a more 
distributed set of runners.
- All our machines are X86. It it may be of value to have some ARM hosts too. 
Particularly to run ARM KVM.

If anyone reading this wants to help with development of this infrastructure 
then I’d be happy to accommodate their input. I realize there are many parts 
explained that can be improved. Using the scripts I provide here: 
https://gem5-review.googlesource.com/c/public/gem5/+/71098 you can setup your 
own runner and test out different setups on your own forks of the gem5 
repository. We’d also welcome improvements to our yaml scripts to better 
utilize what we have and run better tests.

Kind regards,
Bobby
--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616
 
web: https://www.bobbybruce.net
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org

Reply via email to