for tcmalloc do you need master? this recent release seems to have CMake https://github.com/gperftools/gperftools/releases/tag/gperftools-2.15
Of course, I do not mean to force the usage of it. But could be a suggestion in case we do not find anything better and a user has problems. Or a way to inspire later research. For us it is definitely helping. Cheers, Javier On Thu, 21 Mar 2024 at 14:59, Even Rouault via gdal-dev < gdal-dev@lists.osgeo.org> wrote: > I've played with VirtualAlloc(NULL, SINGLE_ALLOC_SIZE, MEM_COMMIT | > MEM_RESERVE, PAGE_READWRITE), and it does avoid the performance issue. > However I see that VitualAlloc() allocates by chunks of 64 kB, so depending > on the size of a block, it might cause significant waste of RAM, so that > can't be used as a direct replacement of malloc(). > > My inclination would be to perhaps have an optional config option like > GDAL_BLOCK_CACHE_USE_PRIVATE_HEAP that could be set, and when doing so it > would use HeapCreate(0, 0, GDAL_CACHEMAX) to create a heap only used by the > block cache. Not ideal, since that would reserve the whole GDAL_CACHEMAX > (but for a large enough processing, you'll end up consuming it), but it has > the advantage of not being extremely intrusive either... and could be > easily ditched/replaced by something better in the future. > > Regarding tcmalloc, I've had to use it on Linux too, but only on scenarios > involving multithreading where it helps reducing RAM fragmentation: cf > https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading > . I've just tried quickly to use it on Windows to test it on the scenario, > but didn't really manage to make it work. Even building it was challenging. > Actually I tried https://github.com/gperftools/gperftools and I had to > build from master since the latest tagged version doesn't build with CMake > on Windows. But then nothing happens when linking tcmalloc_minimal.lib > against my toy app. I probably missed something. > > Anyway I don't really think we can force tcmalloc to be used in GDAL, as a > library. Unless there would be a way to have its allocator to be optionnaly > used at places that we control (ie explicitly call tc_malloc / tc_free), > and not replace the default malloc / free etc, which might be undesirable > when GDAL is just a component of a larger application. > > Disabling entirely the block cache (or setting it to a minimum value) is > only a workable option for uncompressed formats, or if you use per-band > blocks (INTERLEAVE=BAND in GTiff language) and not one block for all bands > (INTERLEAVE=PIXEL), otherwise you'll pay multiple time the decompression. > Le 21/03/2024 à 14:38, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND > APPLICATIONS INC] via gdal-dev a écrit : > > +1. We use a variety of hand-rolled VirtualAlloc based (for basic tasks, > a simple pointer bump, and for more elaborate needs, a ‘buddy’) allocators, > some of which try to be smart about memory usage via de-committing > regions. In our work, we tend to disable the GDAL cache entirely and rely > on the file system’s file cache instead, which is a simplification we can > make but is surely untenable in general here. > > > > *From: *gdal-dev <gdal-dev-boun...@lists.osgeo.org> > <gdal-dev-boun...@lists.osgeo.org> on behalf of Abel Pau via gdal-dev > <gdal-dev@lists.osgeo.org> <gdal-dev@lists.osgeo.org> > *Reply-To: *Abel Pau <a....@creaf.uab.cat> <a....@creaf.uab.cat> > *Date: *Thursday, March 21, 2024 at 4:51 AM > *To: *"gdal-dev@lists.osgeo.org" <gdal-dev@lists.osgeo.org> > <gdal-dev@lists.osgeo.org> <gdal-dev@lists.osgeo.org> > *Subject: *[EXTERNAL] [BULK] Re: [gdal-dev] Experience with slowness of > free() on Windows with lots of allocations? > > > > *CAUTION:* This email originated from outside of NASA. Please take care > when clicking links or opening attachments. Use the "Report Message" > button to report suspicious messages to the NASA SOC. > > > > Hi Even, > > > > you’re right. We also know that. When programming the driver I took it in > consideration. Our solution is not rely on windows to make a good job with > memory and we try to reuse as memory as possible instead of use calloc/free > freely. > > > > For instance, in the driver, for each feature I have to get or write the > coordinates. I could do it every time I have to, so lots of times: create > memory for reading, and then put them on the feature, and then free... so > many times. What I do? When opening the layer I create some memory blocs of > 250 Mb (due to the format itself) and I use that created memory to manage > whatever I need. And when closing, I free it. > > > > While doing that I observed that sometimes I have to use GDAL code that > doesn’t take it in consideration (CPLRecode() for instance). Perhaps it > could be improves as well. > > > > Thanks for noticing that. > > > > *De:* gdal-dev <gdal-dev-boun...@lists.osgeo.org> > <gdal-dev-boun...@lists.osgeo.org> *En nombre de *Javier Jimenez Shaw via > gdal-dev > *Enviado el:* dijous, 21 de març de 2024 8:27 > *Para:* Even Rouault <even.roua...@spatialys.com> > <even.roua...@spatialys.com> > *CC:* gdal dev <gdal-dev@lists.osgeo.org> <gdal-dev@lists.osgeo.org> > *Asunto:* Re: [gdal-dev] Experience with slowness of free() on Windows > with lots of allocations? > > > > In my company we confirmed that "Windows heap allocation mechanism sucks." > > Closing the application after using gtiff driver can take many seconds due > to memory deallocations. > > > > One workaround was to use tcmalloc. I will ask my colleagues more details > next week. > > > > On Thu, 21 Mar 2024, 01:55 Even Rouault via gdal-dev, < > gdal-dev@lists.osgeo.org> wrote: > > Hi, > > while investigating > https://github.com/OSGeo/gdal/issues/9510#issuecomment-2010950408, I've > come to the conclusion that the Windows heap allocation mechanism sucks. > Basically if you allocate a lot of heap regions of modest size with > malloc()/new[], the time spent when freeing them all with corresponding > free()/delete[] is excruciatingly slow (like ~ 10 seconds for ~ 80,000 > allocations). The slowness is clearly quadratic with the number of > allocations. You only start noticing it with ~ 30,000 allocations. And > interestingly, another condition for that slowness is that each > individual allocation much be strictly greater than 4096 * 4 bytes. At > exactly that value, perf is acceptable, but add one extra byte, and it > suddenly drops. I suspect that there must be a threshold from which > malloc() starts using VirtualAlloc() instead of the heap, which must > involve slow system calls, instead of a user-land allocation mechanism. > > Anyone has already hit that and found solutions? The only potential idea > I found until now would be to use a private heap with HeapCreate() with > a fixed maximum size, which is a bit problematic to adopt by default, > basically that would mean that the size of GDAL_CACHEMAX would be > consumed as soon as one use the block cache. > > Even > > -- > http://www.spatialys.com > My software is free, but my time generally not. > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > > _______________________________________________ > gdal-dev mailing > listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- http://www.spatialys.com > My software is free, but my time generally not. > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev