zeroshade commented on issue #863: URL: https://github.com/apache/arrow-go/issues/863#issuecomment-4801696240
I tried to reproduce this myself and the test you've provided here doesn't actually exercise the bloom filter pool. Since your test sets `BloomFilterOffset = 0`, `GetColumnBloomFilter` returns `nil, nil` immediately before reading the header, allocating the bitset, touching the pool or registering cleanup. It just returns the raw deserialized bytes from thrift. It's also important to remember that `TotalAlloc` from `runtime.MemStats` is measuring cumulative allocation (churn) not retained memory, so it's not evidence of a leak on its own. If I set a non-zero offset so the read path runs against the pool (`Size()` returns the expected 1MB), then the pool reuses the bitset buffer and we get ~18 allocs/op and ~2.7 KB/op if we promptly return the buffer, the 1MB buffer is not reallocated per call. `ResizeNoShrink(0)` preserves the capacity; only `Resize(0)` frees the memory. Recreating the `RowGroupBloomFilterReader` each iteration doesn't change this as long as the pool is shared. There *is* a measurable effect that can be improved here due to the `runtime.AddCleanup` though. Because cleanup is GC-timing dependant, under a burst, the buffers being returned to the pool will lag and we'll see cumulative allocations go up (depending on the value of `GOGC` it could be anywhere from 1.5GB - 4GB from my testing), all eventually reclaimed by the GC. Ultimately, this is a churn/latency problem, not a failure of the pool. There's definitely room for us to improve and more promptly return the buffer to the pool though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
