zeroshade commented on PR #864:
URL: https://github.com/apache/arrow-go/pull/864#issuecomment-4848676420

   Following up on my last comment with a concrete alternative direction. I 
measured the original path against a prompt-return path on current `main` (same 
64×100 harness you posted, `b.N=1`, go1.26.4), splitting the profile into 
`alloc_space` (churn) vs `inuse_space` (live after the test's end-of-run GC):
   
   | Path | `alloc_space` (churn) | `inuse_space` (retained) | time/batch |
   |---|---|---|---|
   | Current `main` (return via `runtime.AddCleanup` only) | ~5.1 GB | ~3.0 GB 
| 0.49 s |
   | Return the bitset buffer to the pool **promptly** | ~0.40 GB | **~83 MB** 
| 0.088 s |
   
   So the per-reader `sync.Pool` already reuses correctly — the only problem is 
*when* the buffer goes back. Today the bitset buffer is returned only when the 
`BloomFilter` is GC'd (`addCleanup` → `runtime.AddCleanup`). Under a burst 
those cleanups lag, so `Get()` keeps allocating fresh ~1 MB buffers. Returning 
the same buffer synchronously collapses both churn (~13×) and retained memory 
(~36×) and runs ~5.5× faster. The remaining ~83 MB is ≈ `concurrency × 1 MB` — 
one live buffer per in-flight goroutine, which is exactly the bounded 
steady-state we're after.
   
   The only change from your `BenchmarkGetColumnBloomFilter_OriginalSyncPool` 
that produces the second row is returning the buffer to the pool instead of `_ 
= bf`:
   
   ```go
   bf, err := threadRdr.GetColumnBloomFilter(0)
   if err != nil {
       return
   }
   
   // return the bitset buffer to the pool immediately instead of waiting for GC
   if b, ok := bf.(*blockSplitBloomFilter); ok && b.data != nil {
       if b.cancelCleanup != nil {
           b.cancelCleanup() // cancel the AddCleanup so the buffer isn't 
returned twice
       }
       b.data.ResizeNoShrink(0)
       originalArrowPool.Put(b.data)
   }
   ```
   
   Same harness, same pool, same allocator — only the return timing changes, 
and that alone accounts for the 5.1 GB → 0.40 GB / 3.0 GB → 83 MB difference. 
(There's also a real pool-pollution issue: `file.Reader` hands the bloom reader 
the *same* `sync.Pool` the column/record readers use, so MB-sized bitsets and 
small page buffers churn each other.)
   
   ## Proposal
   
   Keep `GetColumnBloomFilter` and its `AddCleanup` exactly as they are 
(back-compat + a GC safety net so nothing leaks if a caller retains the 
filter), and add an **opt-in prompt-return path**. Two possible shapes — I lean 
toward (A):
   
   **Option A — scoped accessor (preferred):**
   ```go
   func (r *RowGroupBloomFilterReader) VisitColumnBloomFilter(i int, fn 
func(BloomFilter) error) error {
       bf, err := r.GetColumnBloomFilter(i)
       if err != nil || bf == nil {
           return err
       }
       defer r.recycle(bf) // cancelCleanup() + data.ResizeNoShrink(0) + 
pool.Put(data)
       return fn(bf)
   }
   ```
   The buffer returns to the pool the instant the callback ends — 
deterministic, lexically-scoped lifetime, hard to misuse.
   
   **Option B — explicit recycle:**
   ```go
   bf, err := rg.GetColumnBloomFilter(i)
   // ... use bf ...
   rg.RecycleBloomFilter(bf) // cancels the cleanup, ResizeNoShrink(0), Put 
back to pool
   ```
   More flexible for callers who can't wrap usage in a closure, but easier to 
forget (forgetting just falls back to today's GC behavior, so it's safe, only 
slower).
   
   Both call the same helper that cancels the registered cleanup (so the buffer 
isn't double-returned) and puts it back. Either can pair with giving the bloom 
reader its **own per-reader pool**, separate from the page-buffer pool, to fix 
the pollution.
   
   ## Why this over the current PR
   
   The PR reaches a similar steady-state, but the way it gets there trips each 
of the constraints I called out earlier:
   
   - **Public API break** — it adds `Close()` to the `BloomFilter` interface; 
every external implementer/consumer breaks. The proposal adds a method on the 
*reader* and leaves `BloomFilter` untouched.
   - **Allocator bypass** — it reads headers into raw `make([]byte, …)`; the 
proposal keeps everything in allocator-backed `*memory.Buffer`s.
   - **Process-global pools** — it declares package-level 
`headerPool`/`filterPool` channels (currently unused, but the wrong direction); 
the proposal keeps strictly per-reader pools.
   - **Removes the GC safety net** — it drops `AddCleanup` on the read path, so 
a filter backed by a CGO/mallocator allocator leaks if the caller never calls 
`Close()`. The proposal keeps `AddCleanup` as the fallback.
   - **Use-after-recycle** — `Close()` resets the buffer to full cap and pushes 
it into a shared pool while the returned filter's `bitset32` still aliases that 
memory; a concurrent reader can grab and overwrite it. The scoped accessor 
bounds the lifetime so that can't happen.
   
   Net: same bounded-memory result, no interface break, no allocator bypass, no 
global pools, and the existing safety net stays. Happy to put up a PR with 
Option A + the dedicated pool if that direction works for you.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to