alamb commented on code in PR #17021:
URL: https://github.com/apache/datafusion/pull/17021#discussion_r2263831579


##########
datafusion-cli/src/exec.rs:
##########
@@ -227,9 +227,19 @@ pub(super) async fn exec_and_print(
 
     let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_ref())?;
     for statement in statements {
+        let _mem_handle = if print_options.memory_profiling {
+            // RAII guard: dropping the handle disables profiling after 
execution
+            Some(ctx.enable_memory_profiling())
+        } else {
+            None
+        };
         StatementExecutor::new(statement)
             .execute(ctx, print_options)
             .await?;
+        // disable after each statement

Review Comment:
   If you are going to use a RAAI style thing I would expect the `Drop` impl to 
disable profiling 🤔 



##########
datafusion-cli/src/print_options.rs:
##########
@@ -73,6 +73,7 @@ pub struct PrintOptions {
     pub quiet: bool,
     pub maxrows: MaxRows,
     pub color: bool,
+    pub memory_profiling: bool,

Review Comment:
   memory profiling seems like something that should be attached to the session 
context, as it is not related to formatting output



##########
datafusion-cli/src/exec.rs:
##########
@@ -54,7 +54,7 @@ use tokio::signal;
 pub async fn exec_from_commands(
     ctx: &dyn CliSessionContext,
     commands: Vec<String>,
-    print_options: &PrintOptions,
+    print_options: &mut PrintOptions,

Review Comment:
   It seems strange that now PrintOptions are mutable - why would executing a 
command change the options?



##########
datafusion/core/src/execution/context/mod.rs:
##########
@@ -90,6 +90,244 @@ use chrono::{DateTime, Utc};
 use object_store::ObjectStore;
 use parking_lot::RwLock;
 use url::Url;
+/// Memory profiling report for a query.
+#[derive(Debug, Clone, Default, PartialEq, Eq)]
+pub struct MemoryReport {
+    metrics: std::collections::HashMap<String, usize>,
+}
+
+impl MemoryReport {
+    /// Create a new [`MemoryReport`] from the provided metrics.
+    pub fn new(metrics: std::collections::HashMap<String, usize>) -> Self {
+        Self { metrics }
+    }
+
+    /// Returns `true` if the report contains no metrics.
+    pub fn is_empty(&self) -> bool {
+        self.metrics.is_empty()
+    }
+
+    /// Number of tracked operators in the report.
+    pub fn len(&self) -> usize {
+        self.metrics.len()
+    }
+
+    /// Consume the report and return the underlying metrics.
+    pub fn into_inner(self) -> std::collections::HashMap<String, usize> {
+        self.metrics
+    }
+}
+
+impl std::ops::Deref for MemoryReport {
+    type Target = std::collections::HashMap<String, usize>;
+
+    fn deref(&self) -> &Self::Target {
+        &self.metrics
+    }
+}
+
+impl From<std::collections::HashMap<String, usize>> for MemoryReport {
+    fn from(metrics: std::collections::HashMap<String, usize>) -> Self {
+        Self::new(metrics)
+    }
+}
+// Implement IntoIterator for &MemoryReport to allow iterating over &report
+impl<'a> IntoIterator for &'a MemoryReport {
+    type Item = (&'a String, &'a usize);
+    type IntoIter = std::collections::hash_map::Iter<'a, String, usize>;
+
+    fn into_iter(self) -> Self::IntoIter {
+        self.metrics.iter()
+    }
+}
+/// Enhanced memory profiling report with categorization and analysis
+#[derive(Debug)]
+pub struct EnhancedMemoryReport {

Review Comment:
   I wonder why you chose to put this in the session/execution context and not 
in the existing memory pool 
https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/index.html
   
   In particular, I think this code has a lot of similarity to 
https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.TrackConsumersPool.html
   
   Perhaps we can modify that pool or add another wrapper pool for DataFusion 
CLI that saves the maximum consumers, etc



##########
datafusion/core/src/execution/context/mod.rs:
##########
@@ -90,6 +90,244 @@ use chrono::{DateTime, Utc};
 use object_store::ObjectStore;
 use parking_lot::RwLock;
 use url::Url;
+/// Memory profiling report for a query.
+#[derive(Debug, Clone, Default, PartialEq, Eq)]
+pub struct MemoryReport {
+    metrics: std::collections::HashMap<String, usize>,

Review Comment:
   can we please document what this hash map contains (is it memory consumer 
names?)



##########
datafusion-cli/src/exec.rs:
##########
@@ -227,9 +227,19 @@ pub(super) async fn exec_and_print(
 
     let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_ref())?;
     for statement in statements {
+        let _mem_handle = if print_options.memory_profiling {
+            // RAII guard: dropping the handle disables profiling after 
execution

Review Comment:
   I don't understand why this is named with a leading underscore given it is 
referenced below
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to