David Pirotte and Guile Devel,
Changing GC_LARGE_ALLOC_WARN_LEVEL seems like it is the best solution to me. Looking at the suggestions to change the allocator earlier in the discussion, I decided to look more into the feasibility of that solution to see if that alternative was fixable to avoid having to change GC_LARGE_ALLOC_WARN_LEVEL. I did some digging around on https://github.com/ivmai/bdwgc and http://www.hboehm.info/gc/ and have dug around a bit inside bytevectors.c in the past. My conclusion is that while changing the allocator properly would fix the warning, it could introduce more problems. So, sadly, it seems changing GC_LARGE_ALLOC_WARN_LEVEL is the only solution. Unless my analysis is wrong (would be nice if it was because changing the allocator would help performance too). My analysis follows: The warning is thrown because large arrays can cause major performance problems for garbage collectors that work like BDWGC. They decide to keep or collect objects based on whether anything in other objects that are being kept or in the stack point to them (either at their head or somewhere in their interiors). The standard BDWGC malloc (GC_MALLOC) allocates objects that could potentially have pointers in them and thus need to be searched for pointers to other objects. Such searching can be expensive. The BDWGC atomic malloc (GC_MALLOC_ATOMIC) is basically declaring that the object does not contain pointers and thus does not need to be searched, which saves a lot of effort for the GC. But, regardless of whether the allocation is atomic (no pointers) or not, BDWGC still needs to search everything else for pointers to the objects. Structs and things like that have a pointer or two need to be declared to have pointers even if the rest is not pointers. But the rest of the data is effectively random pointers when BDWGC looks at them. Same goes for everything on the stack that is not a pointer. The larger an allocated array is, the more likely that some non-pointer data will accidentally point to its interior if it is looked at through the lens of a pointer. This is why BDWGC throws the warning when large arrays are allocated repeatedly with GC_MALLOC and GC_MALLOC_ATOMIC. There is a high probability that many of them will be kept around beyond what is required (and thus they take up RAM) due to non-pointers accidentally pointing to them, so BDWGC lets the programmer and user know. To help mitigate this, BDWGC offers GC_MALLOC_IGNORE_OFF_PAGE and GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE (note, the latter function is not mentioned on the readme on the git page but is mentioned at http://www.hboehm.info/gc/gcinterface.html) which do the same allocations, but only considers them pointed to if pointers or data BDWGC thinks might be pointers points to the first 512 bytes of the objects. Since they then look like 512 byte long objects to BDWGC for the purpose of deciding whether to keep or collect them, there is a much lower probability of them being accidentally kept a long time. There is one major catch. If one is still using the object but one's only pointer to them is pointing at somewhere after the 512 byte mark, they could get prematurely collected. Now, going to SRFI-4 vectors and R6RS bytevectors, which underneath use mostly the same code in Guile, they are allocated in make_bytevector with GC_MALLOC_ATOMIC (indirectly through SCM_GC_MALLOC_POINTERLESS) and an SCM with a pointer to the head returned by the function. In principle, that could be changed to do a size check and then use GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE if it is larger than 100 kB (note, changing it to the non-atomic version while it would get rid of the warning and make sure it doesn't get kept too long on accident, would mean that it is searched inside for pointers which could then keep other stuff on accident). The only worry then would be that it would get collected while still being used. I think most cases, this would not be a problem. However, if someone makes a new bytevector from an existing one from somewhere in the middle, it is possible that the new one would only point to the middle and not the head and thus could be collected prematurely (would need to do some more digging to see if the new one would be allocated using make_bytevector_from_buffer). Or, if someone was using C code to say take the norm of the vector (very common operation often done with BLAS) and the scheme code wasn't going to use the bytevector anymore, there might only be a pointer on the stack pointing to the current element that the C code is reading and as soon as it gets past the 512 byte mark, the bytearray might get collected while it is still being worked on which would be a disaster. So I am not sure that the allocation could be safely changed to use GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE if the bytevector is large. I do not know enough about Guile internals yet to know if typical pure scheme operations would run into problems. I think it is definitely possible that there are FFI cases where problems could be run into, which would then mean the coder has to take extra precautions to prevent collection, which could be a major problem for changing the allocation Guile 2.0.x and 2.2.x since it would be a major API change. Wouldn't be such an issue for 3.x series since the API could be changed but it would be a bit of a surprising result for people to have to worry about if using FFI. I could be wrong on this - a pointer to the head might still be kept on the stack and then there is no problem. So, it seems, that disabling the warning through GC_LARGE_ALLOC_WARN_LEVEL or some other method is the only safe solution, unless my analysis above is wrong and the allocation code could be safely changed. Freja Nordsiek On 12/31/2017 02:22 PM, David Pirotte wrote: > Hello, > >>> If all you are doing is trying to get Guile not to issue warnings about big >>> allocations, I think all you need to do is put -DGC_IGNORE_WARN in the >>> CFLAGS when you build Guile. >> Thanks for the suggestion, but it does not work. > For those interested, Mike did find a way to get rid of those warnings, and > posted it > in #guile: > > <spk121> daviid: to quell BDW-GC large alloc warnings via environment > variables, you can set GC_LARGE_ALLOC_WARN_INTERVAL to something much > larger than its default of 5 > > which works perfectly, thanks Mike! > > David. > >