This is great. I wanted this for a long time. Joachim, could you write a wiki page with step-by-step instructions for how to set this up, detailed enough that e.g. one of our infrastructure volunteers could set it up on another machine.
Haskell infrastructure people, do we have a (e.g. Hetzner) machine that we can run this on? On Wed, Jul 16, 2014 at 10:02 AM, Joachim Breitner <m...@joachim-breitner.de > wrote: > Hi, > > I guess it’s time to talk about this, especially as Richard just brought > it up again... > > I felt that we were seriously lacking in our grip on performance issues. > We don’t even know whether 6.8.3 was better or worse than 6.8.3 or 7.6.4 > in terms of nofib, not to speak of the effect of each single commit. > > I want to change that, so I set up a benchmark monitoring dashboard. You > can currently reach it at: > > http://ghcspeed-nomeata.rhcloud.com/ > > What does it do? > ~~~~~~~~~~~~~~~~ > > It monitors the repository (master branch only) and builds each commit, > complete with the test suite and nofib. The log is saved and analyzed, > and some numbers are extracted: > * The build time > * The test suite summary numbers > * Runtime (if >1s), allocations and binary sizes of the nofib > benchmarks > > These are uploaded to the website above, which is powered by codespeed, > a general performance dashboard, implemented in Python using Django. > > Under _Changes_, it provides a report for each commit (changes wrt. to > the previous version, and wrt. to 10 revisions earlier, the so-called > “trend”). A summary of these reports is visible on the front-page. > > The _Timeline_ is a graph for each individual performance number. If > there are bumps, you can hopefully find them there! You can also compare > to 7.8.3, which is available as a “baseline”. > > _Comparison_ will be more useful if we have more tagged revision, or if > were benchmarking various options (e.g. -fllvm): Here you can do > bar-chart comparisons. > > Why codespeed? > ~~~~~~~~~~~~~~ > > For a long time I searched for a suitable software product, and one > criterion is that it should be open source, rather simple to set up and > mostly decoupled from other tools, i.e. something that I throw numbers > at and which then displays them nicely. While I don’t think codespeed is > the best performance dashboard out there (I find > http://goperfd.appspot.com/perf a bit better; I wonder how well > codespeed scales to even larger numbers of benchmarks and I wish it were > more git-aware), it was the easiest to get started with. And thanks to > the loose coupling of (1) running the tests to acquire a log, (2) > parsing the log to get numbers and (3) putting them on a server, we can > hopefully replace it when we come along something better. I was hoping > for the Phabricator guys to have something in their tool suite, but > doesn’t look like it. > > How does it work (currently)? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > My office PC is underused (I work on my laptop), so its currently > dedicated to it. I have a simple shell script that monitors the repo for > new versions. It builds the newest revision and works itself back to the > commit where everything was turned into submodules: > https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/watch.sh > > It calls a script that does the actual building: > https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/run-speed.sh > This produces a log file which should contain all the required numbers > somewhere. > > A second script extracts these numbers (with help of nofib-analyze) and > converts them into codespeed compatible JSON files: > https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/log2json.pl > > Finally, a simple invocation to curl uploads them to codespeed: > https://github.com/nomeata/codespeed/blob/ghc/tools/ghc/upload.sh > > So if you want additional benchmarks to be tracked, make sure they are > present in the logs and adjust log2json.pl. codespeed will automatically > pick up new benchmarks in these logs. Reimplementations in Haskell are > also welcome :-) > > The testsuite is run with VERBOSE=4, so the performance numbers are also > shown for failing test cases. So once a test case goes over the limit, > you can grep through previous logs try to find the real culprit. I > uploaded the logs (so far) to https://github.com/nomeata/ghc-speed-logs > (but this is not automated yet, ping me if you need an update on this). > > What next? > ~~~~~~~~~~ > > Clearly, the current setup is only good enough to evaluate the system. > Eventually, I might want to use my office PC again, and the free hosting > on openshift is not very powerful. > > So if we want to keep this setup and make it “official”, we need find a > permanent solution.¹ This involves: > > * A dedicated machine to run the benchmarks. This probably shouldn’t be > a VM, if we want to keep the noise in the runtime down. > * A machine to run the codespeed server. Can be a VM, or even run on > any of the system that we have right now. Just needs a database > (postgresql preferably) and a webserver supporting WSGI (i.e. any > of them). > * Maybe a better place to store the logs for public consumption. > > Also, there are way to improve the system: > > * As I said, I don’t think codespeed is the best. If we find something > better, we can replace it. Since we have all the logs, we can easily > fill the new system with the data, or even run both at the same time. > * We might want to have more numbers. I am already putting > lines-of-code and disk space usage numbers into the logs, but do not > parse them yet. > * In particular, we might want to put in each performance test case as > a benchmark of its own, to easier find commits that degrade (or > improve!) performance. I’m not sure how well the web page will handle > that. > * We might want to replace my rather simple watch.sh-script by > something more serious. In particular, I imagine that our builder > setup could manages this, with a dedicated builder doing the > benchmark runs and the builder server scheduling a build for each > commit. > > > That’s it for now. Enjoy clicking around! > > Greetings, > Joachim > > ¹ I guess that could be considered beta-reduction :-) > > > > -- > Joachim Breitner > e-Mail: m...@joachim-breitner.de > Homepage: http://www.joachim-breitner.de > Jabber-ID: nome...@joachim-breitner.de > > > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://www.haskell.org/mailman/listinfo/ghc-devs > >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs