andygrove opened a new pull request, #3869:
URL: https://github.com/apache/datafusion-comet/pull/3869

   ## Which issue does this PR close?
   
   Related to investigating off-heap memory usage in Comet vs Spark.
   
   ## Rationale for this change
   
   When running TPC-H at 1TB scale, Comet requires significantly more off-heap 
memory than expected (32GB+ vs 2GB for Gluten). We need tooling to measure and 
isolate the cause.
   
   ## What changes are included in this PR?
   
   - `benchmarks/tpc/memory-profile.sh` — Script that runs each TPC-H query 
individually under different configurations (Spark-only baseline, Comet with 
varying offHeap sizes) in local mode, wrapping each run with `/usr/bin/time -l` 
to capture peak RSS. Outputs a CSV for easy comparison.
   - `docs/memory-analysis.md` — Analysis document investigating why Comet 
needs more off-heap memory, covering memory pool architecture, untracked memory 
sources, comparison with Gluten's approach, and proposed fixes.
   
   ## How are these changes tested?
   
   These are developer tools and documentation, not production code. The script 
has been validated locally against TPC-H SF100.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to