Le 15/10/2023 à 13:34, Javier Jimenez Shaw via gdal-dev a écrit :
Hi Even. Thanks, it sounds good.
However I see a potential problem. I see that you use once "SetCacheMax". We should not forget about that in the future for sensible tests. The cache of gdal is usually a percentage of the total memory, that may change among the environments and time.

Javier,

What is sure is that the timings got in one session of the perf tests in CI are comparable to nothing else that timings done in the same session (and that's already challenging!). So the effect of the RAM available in the CI worker might affect the speed of the tests, but it will affect it in the same way for both the reference run and the tested run (while the GDAL_CACHEMAX=5% setting remains the same and the general working of the block cache remains similar). I anticipate that at some points changes in GDAL might make the perf test suite no longer comparable to the current reference version and that we will have to upgrade the commit of the reference version while that happens. Actually if the perf test suite is extended, it might be useful to upgrade the commit of the reference version at release time of feature releases. For example, when GDAL 3.8.0 is released, it will become the reference point for 3.9.0 development, and so on (otherwise we wouldn't get perf regression testing of added tests). The downside of this is that this wouldn't catch progressive slowdowns over several release cycles. But given that I had to raise the threshold for failure to > 30% regression to avoid false positives, the perf test suite (at least when run in CI with all its unpredictability) can only catch major "instant" regressions.

Even


On Wed, 11 Oct 2023, 07:53 Laurențiu Nicola via gdal-dev, <gdal-dev@lists.osgeo.org> wrote:

    Hi,

    No experience with pytest-benchmark, but I maintain an unrelated
    project that runs some benchmarks on CI, and here are some things
    worth mentioning:

     - we store the results as a newline-delimited JSON file in a
    different GitHub repository
    
(https://raw.githubusercontent.com/rust-analyzer/metrics/master/metrics.json,
    warning, it's a 5.5 MB unformatted JSON)
     - we have an in-browser dashboard that retrieves the whole file
    and displays them: https://rust-analyzer.github.io/metrics/
     - we do track build time and overall run time, but we're more
    interested in correctness
     - the display is a bit of a mess (partly due to trying to keep
    the setup as simple as possible), but you can look for the "total
    time", "total memory" and "build" to get an idea
     - we store the runner CPU type and memory in that JSON; they're
    almost all Intel, but they do upgrade from time to time
     - we even have two AMD EPYC runs, note that boost is disabled in
    a different way there (we don't try to disable it, though)
     - we also try to measure the CPU instruction count (the perf
    counter), but it doesn't work on GitHub and probably in most VMs
     - the runners have been very reliable, but not really consistent
    in performance
     - a bigger problem for us was that somebody actually needs to
    look at the dashboard to spot any regressions and investigate them
    (some are caused by external changes)
     - in 3-5 years we'll probably have to trim down the JSON or
    switch to a different storage

    Laurentiu

    On Tue, Oct 10, 2023, at 21:08, Even Rouault via gdal-dev wrote:
    > Hi,
    >
    > I'm experimenting with adding performance regression testing in
    our CI.
    > Currently our CI has quite extensive functional coverage, but
    totally
    > lacks performance testing. Given that we use pytest, I've spotted
    > pytest-benchmark
    (https://pytest-benchmark.readthedocs.io/en/latest/) as
    > a likely good candidate framework.
    >
    > I've prototyped things in https://github.com/OSGeo/gdal/pull/8538
    >
    > Basically, we now have a autotest/benchmark directory where
    performance
    > tests can be written.
    >
    > Then in the CI, we checkout a reference commit, build it and run
    the
    > performance test suite in --benchmark-save mode
    >
    > And then we run the performance test suite on the PR in
    > --benchmark-compare mode with a --benchmark-compare-fail="mean:5%"
    > criterion (which means that a test fails if its mean runtime is 5%
    > slower than the reference one)
    >
    >  From what I can see, pytest-benchmark behaves correctly if
    tests are
    > removed or added (that is not failing, just skipping them during
    > comparison). The only thing one should not do is modify an
    existing test
    > w.r.t the reference branch.
    >
    > Does someone has practical experience of pytest-benchmark, in
    particular
    > in CI setups? With virtualization, it is hard to guarantee that
    other
    > things happening on the host running the VM might not interfer.
    Even
    > locally on my own machine, I initially saw strong variations in
    timings,
    > which can be reduced to acceptable deviation by disabling Intel
    > Turboboost feature (echo 1 | sudo tee
    > /sys/devices/system/cpu/intel_pstate/no_turbo)
    >
    > Even
    >
    > --
    > http://www.spatialys.com
    > My software is free, but my time generally not.
    >
    > _______________________________________________
    > gdal-dev mailing list
    > gdal-dev@lists.osgeo.org
    > https://lists.osgeo.org/mailman/listinfo/gdal-dev
    _______________________________________________
    gdal-dev mailing list
    gdal-dev@lists.osgeo.org
    https://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to