bharath-techie commented on issue #19386:
URL: https://github.com/apache/datafusion/issues/19386#issuecomment-3675139619
I also saw another issue while working on this issue - say you limit the
memory to 4 GB and now `GroupedHashAggregateStream` spills - for `URL` field -
for each spill , it writes the entire record batch :O - so the amount it spills
to disk exceeds 100 GB and results in resource exhausted exception.
There as well , in spill manager - if I `gc` the string view - and then
spill , it only spills the current record batch.
Easily reproducible :
Take one clickbench partitioned parquet file ~120 mb :
```
RUST_LOG=datafusion_physical_plan=debug ./datafusion-cli -m 40m
--disk-spill-path /home/ec2-user/spilldir --disk-limit 75g
SET datafusion.execution.listing_table_ignore_subdirectory = false;
SET datafusion.execution.target_partitions=1;
SET datafusion.execution.parquet.binary_as_string=true;
CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION '/home/ec2-user/hits_0.parquet';
SELECT "URL", COUNT(*) AS c FROM hits GROUP BY "URL" ORDER BY c DESC LIMIT
10;
Before fix :
[2025-12-19T07:44:47Z DEBUG
datafusion_physical_plan::spill::in_progress_spill_file] [SPILL_FILE]
Finished spill file:
path="/home/ec2-user/spill/datafusion-xKj4Qt/.tmpoxkWIz",
size=820.54 MB, total_spilled_bytes=820.54 MB, total_spill_files=1
After fix :
[2025-12-19T07:46:54Z DEBUG
datafusion_physical_plan::spill::in_progress_spill_file] [SPILL_FILE] Finished
spill file: path="home/ec2-user/spill/datafusion-3z9mL6/.tmpF7hNi9",
size=33.43 MB, total_spilled_bytes=33.43 MB, total_spill_files=1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]