2010YOUY01 commented on code in PR #16814:
URL: https://github.com/apache/datafusion/pull/16814#discussion_r2233621418


##########
benchmarks/README.md:
##########
@@ -321,6 +322,64 @@ FLAGS:
 ...
 ```
 
+# Profiling Memory Stats for each benchmark query
+The `mem_profile` program wraps benchmark execution to measure memory usage 
statistics, such as peak RSS. It runs each benchmark query in a separate 
subprocess, capturing the child process’s stdout to print structured output.
+
+Subcommands supported by mem_profile are the subset of those in `dfbench`.
+Currently supported benchmarks include: Clickbench, H2o, Imdb, SortTpch, Tpch
+
+Before running benchmarks, `mem_profile` automatically compiles the benchmark 
binary (`dfbench`) using `cargo build` with the same cargo profile (e.g., 
--release) as mem_profile itself. By prebuilding the binary and running each 
query in a separate process, we can ensure accurate memory statistics.
+
+Currently, `mem_profile` only supports `mimalloc` as the memory allocator, 
since it relies on `mimalloc`'s API to collect memory statistics.
+
+Because it runs the compiled binary directly from the target directory, make 
sure your working directory is the top-level datafusion/ directory, where the 
target/ is also located. 

Review Comment:
   I suggest to add it (should be runned under `datafusion/`) to the error 
message
   
   Now such panic will be triggered, if it's running under a different 
directory.
   ```
   Benchmark binary built successfully.
   
   thread 'main' panicked at benchmarks/src/bin/mem_profile.rs:152:10:
   Failed to start benchmark: Os { code: 2, kind: NotFound, message: "No such 
file or directory" }
   ```



##########
benchmarks/src/util/memory.rs:
##########
@@ -0,0 +1,54 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/// Print Peak RSS, Peak Commit, Page Faults based on mimalloc api
+pub fn print_memory_stats() {
+    #[cfg(all(feature = "mimalloc", feature = "mimalloc_extended"))]
+    {
+        use datafusion::execution::memory_pool::human_readable_size;
+        let mut peak_rss = 0;
+        let mut peak_commit = 0;
+        let mut page_faults = 0;
+        unsafe {
+            libmimalloc_sys::mi_process_info(
+                std::ptr::null_mut(),
+                std::ptr::null_mut(),
+                std::ptr::null_mut(),
+                std::ptr::null_mut(),
+                &mut peak_rss,
+                std::ptr::null_mut(),
+                &mut peak_commit,
+                &mut page_faults,
+            );
+        }
+
+        println!(

Review Comment:
   I recommend to add a comment: when changing, make sure the parser in 
`mem_profile.rs` is compatible.



##########
benchmarks/README.md:
##########
@@ -321,6 +322,64 @@ FLAGS:
 ...
 ```
 
+# Profiling Memory Stats for each benchmark query
+The `mem_profile` program wraps benchmark execution to measure memory usage 
statistics, such as peak RSS. It runs each benchmark query in a separate 
subprocess, capturing the child process’s stdout to print structured output.
+
+Subcommands supported by mem_profile are the subset of those in `dfbench`.
+Currently supported benchmarks include: Clickbench, H2o, Imdb, SortTpch, Tpch
+
+Before running benchmarks, `mem_profile` automatically compiles the benchmark 
binary (`dfbench`) using `cargo build` with the same cargo profile (e.g., 
--release) as mem_profile itself. By prebuilding the binary and running each 
query in a separate process, we can ensure accurate memory statistics.
+
+Currently, `mem_profile` only supports `mimalloc` as the memory allocator, 
since it relies on `mimalloc`'s API to collect memory statistics.
+
+Because it runs the compiled binary directly from the target directory, make 
sure your working directory is the top-level datafusion/ directory, where the 
target/ is also located. 
+
+Example: 
+```shell
+datafusion$ cargo run --profile release-nonlto --bin mem_profile -- tpch 
--path benchmarks/data/tpch_sf1 --partitions 4 --format parquet

Review Comment:
   Let's add an explanation for the options after `--` : they should be pass 
through to `dfbench`, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to