Re: [I] Severe sync.Pool degradation and infinite heap allocations in GetColumnBloomFilter due to pool pollution and ResizeNoShrink(0) [arrow-go]

via GitHub Thu, 25 Jun 2026 09:13:26 -0700


zeroshade commented on issue #863:
URL: https://github.com/apache/arrow-go/issues/863#issuecomment-4801696240


   I tried to reproduce this myself and the test you've provided here doesn't 
actually exercise the bloom filter pool.
   
   Since your test sets `BloomFilterOffset = 0`, `GetColumnBloomFilter` returns 
`nil, nil` immediately before reading the header, allocating the bitset, 
touching the pool or registering cleanup. It just returns the raw deserialized 
bytes from thrift. 
   
   It's also important to remember that `TotalAlloc` from `runtime.MemStats` is 
measuring cumulative allocation (churn) not retained memory, so it's not 
evidence of a leak on its own.
   
   If I set a non-zero offset so the read path runs against the pool (`Size()` 
returns the expected 1MB), then the pool reuses the bitset buffer and we get 
~18 allocs/op and ~2.7 KB/op if we promptly return the buffer, the 1MB buffer 
is not reallocated per call. `ResizeNoShrink(0)` preserves the capacity; only 
`Resize(0)` frees the memory. Recreating the `RowGroupBloomFilterReader` each 
iteration doesn't change this as long as the pool is shared.
   
   There *is* a measurable effect that can be improved here due to the 
`runtime.AddCleanup` though. Because cleanup is GC-timing dependant, under a 
burst, the buffers being returned to the pool will lag and we'll see cumulative 
allocations go up (depending on the value of `GOGC` it could be anywhere from 
1.5GB - 4GB from my testing), all eventually reclaimed by the GC. 
   
   Ultimately, this is a churn/latency problem, not a failure of the pool. 
There's definitely room for us to improve and more promptly return the buffer 
to the pool though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Severe sync.Pool degradation and infinite heap allocations in GetColumnBloomFilter due to pool pollution and ResizeNoShrink(0) [arrow-go]

Reply via email to