ding-young commented on code in PR #16814: URL: https://github.com/apache/datafusion/pull/16814#discussion_r2230051866
########## benchmarks/README.md: ########## @@ -321,6 +322,64 @@ FLAGS: ... ``` +# Profiling Memory Stats for each benchmark query +The `mem_profile` program wraps benchmark execution to measure memory usage statistics, such as peak RSS. It runs each benchmark query in a separate subprocess, capturing the child process’s stdout to print structured output. + +Subcommands supported by mem_profile are the subset of those in `dfbench`. +Currently supported benchmarks include: Clickbench, H2o, Imdb, SortTpch, Tpch + +Before running benchmarks, `mem_profile` automatically compiles the benchmark binary (`dfbench`) using `cargo build` with the same cargo profile (e.g., --release) as mem_profile itself. By prebuilding the binary and running each query in a separate process, we can ensure accurate memory statistics. + +Currently, `mem_profile` only supports `mimalloc` as the memory allocator, since it relies on `mimalloc`'s API to collect memory statistics. + +Because it runs the compiled binary directly from the target directory, make sure your working directory is the top-level datafusion/ directory, where the target/ is also located. Review Comment: Here's more description about this utility and supported metrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org