I'm thinking of setting up a github action bot to track metrics for each commit/merge. Basically it should run the same benchmark code twice. One uses a Nim compiler without the commit, the other uses a Nim compiler with the commit. Finally, It comments on that PR so that we have long-term results which we look up at any time and find performance regressions easier. I have an unfinished protype here => <https://github.com/nim-lang/Nim/pull/19941>
Any ideas about how to make benchmarks more reliable or which metrics to collect? For a start, I propose to collect compilation time and memory usage for each commit/merge.
