Re: [PR] Add benchmark utility to profile peak memory usage [datafusion]

via GitHub Thu, 24 Jul 2025 20:19:32 -0700


ding-young commented on code in PR #16814:
URL: https://github.com/apache/datafusion/pull/16814#discussion_r2230051866



##########
benchmarks/README.md:
##########
@@ -321,6 +322,64 @@ FLAGS:
 ...
 ```
 
+# Profiling Memory Stats for each benchmark query
+The `mem_profile` program wraps benchmark execution to measure memory usage 
statistics, such as peak RSS. It runs each benchmark query in a separate 
subprocess, capturing the child process’s stdout to print structured output.
+
+Subcommands supported by mem_profile are the subset of those in `dfbench`.
+Currently supported benchmarks include: Clickbench, H2o, Imdb, SortTpch, Tpch
+
+Before running benchmarks, `mem_profile` automatically compiles the benchmark 
binary (`dfbench`) using `cargo build` with the same cargo profile (e.g., 
--release) as mem_profile itself. By prebuilding the binary and running each 
query in a separate process, we can ensure accurate memory statistics.
+
+Currently, `mem_profile` only supports `mimalloc` as the memory allocator, 
since it relies on `mimalloc`'s API to collect memory statistics.
+
+Because it runs the compiled binary directly from the target directory, make 
sure your working directory is the top-level datafusion/ directory, where the 
target/ is also located. 

Review Comment:
   Here's more description about this utility and supported metrics. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add benchmark utility to profile peak memory usage [datafusion]

Reply via email to