Re: [I] NestedLoopJoinExec spill path: untracked allocation overshoots memory pool [datafusion]

via GitHub Tue, 02 Jun 2026 19:50:40 -0700


2010YOUY01 commented on issue #22723:
URL: https://github.com/apache/datafusion/issues/22723#issuecomment-4608669994


   I think, given the current implementation conventions, this kind of 
validation is not feasible in this failing SLT.
   
   ```sql
   SET datafusion.execution.target_partitions = 1;
   SET datafusion.runtime.memory_limit = '150K';
   
   SELECT count(*) as cnt, min(v1) as mn, max(v1) as mx
   FROM generate_series(1, 100000) AS t1(v1)
   INNER JOIN generate_series(1, 1) AS t2(v2)
   ON (t1.v1 + t2.v2) > 0;
   ```
   
   ## Issue 1 -- operators don't track in-progress batches
   
   Operators don't account for in-progress batch memory for simplicity. This is 
typically one batch inside each operator per partition. If the batches are wide 
or the partition count is high, this can become significant.
   
   This query already contains a substantial amount of untracked in-progress 
memory:
   
   - `generate_series` holds one temporary batch: 8K rows * 8 bytes = 64KB
   - NLJ holds a batch for evaluating the filter `(v1, v2, 0)`, plus a result 
batch: roughly 8K rows * 8 bytes * 3
   
   Given the current implementation, this behavior is expected and not specific 
to NLJ. Addressing it would require updating all operators to account for 
in-progress batches.
   
   ## Issue 2 -- memory allocator noise
   
   I'm assuming the 770KB measurement is allocator RSS. This may also include 
internal fragmentation, or memory that the allocator has not yet returned to 
the OS. In such cases RSS can appear high, even though the allocator could 
potentially release the memory immediately if the OS requests it.
   
   ## Ideas on follow up works
   Overall, I don't think issue 1 is easy to solve, and perhaps these very 
small memory-limited queries should not be used for allocator-vs-pool 
validation. They are still useful as correctness tests.
   
   This kind of validation is more feasible when `memory pool limit >> single 
batch size`. That likely means using memory limits in the hundreds of MB range 
and adding a separate extended test suite with dedicated test cases.
   
   Issue 2 remains a challenge, and I'm not sure how to make the validation 
more accurate. It likely requires a deeper investigation into allocator and OS 
memory accounting behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] NestedLoopJoinExec spill path: untracked allocation overshoots memory pool [datafusion]

Reply via email to