As Even said, this is a really tough topic. I have tried some micro benchmarking for small bits and for short term dev this is sort of ok. The biggest problem is getting a stable test env for benchmarking. Even a single user machine doing only benchmarking is all over the place. And if you are benchmarking on a fleet, the difference in other tasks and exact specs of the machines makes the data crazy noisy. Even with binary sizes and ram usage I saw large run to run variance because of slight changes in dependencies changing how the compiler's optimizers work. For timing, if I end up with different hardware it's bad, but even within a hardware config, bus contention, how hot the caches stay, ssd perf, network, and other systems can be highly variable.
On Sun, Feb 25, 2024, 5:27 AM Adam Stewart via gdal-dev < gdal-dev@lists.osgeo.org> wrote: > Thanks Even, > > I think what I'm envisioning is more of an integration test than a unit > test. We don't intend to use this in TorchGeo CI on every commit, only on > PRs that we know may impact I/O (much less frequent than in GDAL). We would > also run it before each release and publish performance metrics to prevent > regressions. Since it's run infrequently and manually, we wouldn't suffer > from the same issues of 20% buffering and could actually run multiple times > and average. > > For TorchGeo, we definitely want to consider full-sized tiles/scenes, not > small synthetic patches. Many of our sampling strategies and design > decisions require multiple large scenes to accurately validate. > > Unless someone chimes in with different opinions, it sounds like there is > room for a research paper on this topic. Would love to include some GDAL > developers on such a paper if anyone has interest. Will talk this over with > my own research group. > > P.S. We've also been thinking about how to improve GPU support in GDAL. > The lowest hanging fruit is anything that can be formulated as matrix > multiplication, such as affine transformations in gdalwarp. Unfortunately, > I don't know anything about CUDA/ROCm. If we were to do this in TorchGeo, > we would just use PyTorch, which has a lot of overhead you won't need in > GDAL. But let's discuss this in a different thread, don't want to derail > this conversation. > > *Dr. Adam J. Stewart* > Technical University of Munich > School of Engineering and Design > Data Science in Earth Observation > > On Feb 25, 2024, at 13:25, Even Rouault <even.roua...@spatialys.com> > wrote: > > Adam, > > Automated performance regression testing is probably one of the aspect of > testing that could be enhanced. While the GDAL autotest suite is quite > comprehensive functionally wise, performance testing has traditionally been > a bit lagging. That said, this is an aspect we have improved lately with > the addition of a benchmark component to the autotest suite > https://github.com/OSGeo/gdal/tree/master/autotest/benchmark . This is > admitedly quite minimalistic for now, but testing some scenarios involving > the GTiff driver and gdalwarp. > > To test non-regression for a pull request, we have a CI benchmark > configuration ( > https://github.com/OSGeo/gdal/blob/master/.github/workflows/linux_build.yml#L111 > + https://github.com/OSGeo/gdal/tree/master/.github/workflows/benchmarks) > that runs the benchmarks first against master, and then with the pull > request (during the same run of the same worker). But we need to allow a > quite large tolerance threshold (up to +20%) to take into account that > accurate timing measurements are extremely hard to get on CI infrastructure > (even locally, on microbenchmarks this is very challenging). So this will > mostly catch up big regressions, not subtle ones. > > One of the difficulty with benchmark testing is that we don't want the > tests to run for hours, especially for pull requests, so they need to be > written in a careful way to still trigger the relevant code paths & > mechanisms of the code base that are exercised by real-world large datasets > while running in a few seconds each at most. Typically those tests > autogenerate their test data too, to avoid the test suite depending on too > large datasets and keep the repository size as small as possible. > > As you mention GPUs, we have had private contacts from a couple GPU makers > in recent years about potential GPU'ification of GDAL, but this has lead to > nowhere for now. Some mentioned that moving data acquisition to the GPU > could be interesting performance wise, but that seems to be a huge > undertaking, basically moving the GTiff driver, libtiff and its codecs as > GPU code. And even if done, how to manage the resulting code duplication... > We aren't even able to properly keep up the OpenCL warper contributing many > years ago in sync with the CPU warping code. We also lack GPU expertise in > the current team to do that. > > Even > Le 25/02/2024 à 12:58, Adam Stewart via gdal-dev a écrit : > > Hi, > > *Background*: I'm the developer of the TorchGeo > <https://github.com/microsoft/torchgeo> software library. TorchGeo is a > machine learning library that heavily relies on GDAL (via rasterio/fiona) > for satellite imagery I/O. > > One of our primary concerns is ensuring that we can load data from disk > fast enough to keep the GPU busy during model training. Of course, > satellite imagery is often distributed in large files that make this > challenging. We use various tricks to optimize performance (COGs, windowed > reading, caching, compression, parallel workers, etc.). In our initial > paper <https://arxiv.org/abs/2111.08872>, we chose to create our own > arbitrary I/O benchmarking dataset composed of 100 Landsat scenes and 1 CDL > map. See Figure 3 for the results, and Appendix A for the experiment > details. > > *Question*: is there an official dataset that the GDAL developers use to > benchmark GDAL itself? For example, if someone makes a change to how GDAL > handles certain I/O operations, I assume the GDAL developers will benchmark > it to see if I/O is now faster or slower. I'm envisioning experiments > similar to > https://kokoalberti.com/articles/geotiff-compression-optimization-guide/ > for various file formats, compression levels, block sizes, etc. > > If such a dataset doesn't yet exist, I would be interested in creating one > and publishing a paper on how this can be used to develop libraries like > GDAL and TorchGeo. > > *Dr. Adam J. Stewart* > Technical University of Munich > School of Engineering and Design > Data Science in Earth Observation > > > _______________________________________________ > gdal-dev mailing > listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- http://www.spatialys.com > My software is free, but my time generally not. > > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev