avantgardnerio opened a new issue, #22723:
URL: https://github.com/apache/datafusion/issues/22723
### Describe the bug
`NestedLoopJoinExec`'s memory-limited (spill) path allocates more memory
than it reserves through the `MemoryPool`. The accounting-pool framework added
in #22626 surfaces this if `HEADROOM_FACTOR` in the SLT pool is tightened — see
#22721, which lowers it from 8.0 → 5.0 and trips this test.
### To Reproduce
In `datafusion/sqllogictest/src/accounting_pool.rs`, set:
\`\`\`rust
const HEADROOM_FACTOR: f64 = 5.0;
\`\`\`
Then run `nested_loop_join_spill.slt`. First query at line 33:
\`\`\`sql
SET datafusion.execution.target_partitions = 1;
SET datafusion.runtime.memory_limit = '150K';
SELECT count(*) as cnt, min(v1) as mn, max(v1) as mx
FROM generate_series(1, 100000) AS t1(v1)
INNER JOIN generate_series(1, 1) AS t2(v2)
ON (t1.v1 + t2.v2) > 0;
\`\`\`
### Expected behavior
Query succeeds with allocator usage bounded by the configured `memory_limit`
(×~10% slop for legitimate untracked overhead like Tokio/Rayon thread state).
### Actual behavior
\`\`\`
External error: 1 errors in file
datafusion/sqllogictest/test_files/nested_loop_join_spill.slt
1. query failed: Other Error: allocator overdraft: account balance at panic
= -20245 bytes
at nested_loop_join_spill.slt:33
\`\`\`
Peak allocator usage reaches ~770KB against a declared 150KB pool — 5.13×
the budget. The first ~750KB (5×) is absorbed by the headroom factor; the
additional 20245 bytes is allocator usage that was never reserved through
`MemoryPool::try_grow`.
### Likely root cause
Spill path in `NestedLoopJoinExec` has at least these untracked allocation
sites (need verification with targeted instrumentation):
- `generate_next_batch` buffering — `RecordBatch`es accumulated for the
build side before the spill decision
- `concat_batches` at the spill boundary — copies into a coalesced batch
before the IPC writer sees it
- `take_native` during the probe phase — gather kernels allocate output
buffers
- IPC reader path on spill re-read — the `StreamReader`/`FileReader` decoder
owns buffers that aren't accounted
The first query in the test stays small (100K × i32 = ~400KB build side), so
the overshoot is whatever lives off-pool in the spill setup, not the bulk data
itself.
### Additional context
- Surfaced by the allocator-vs-pool tracker added in #22626.
- Demonstration PR: #22721 (test-only: lowers `HEADROOM_FACTOR`; expected to
fail until this is fixed).
- Fix should reserve memory through `MemoryReservation` before each of the
sites above, not bump the headroom constant — the constant is a leak detector,
not a budget.
### Component
- [x] Physical Plan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]