alamb commented on code in PR #17021: URL: https://github.com/apache/datafusion/pull/17021#discussion_r2263831579
########## datafusion-cli/src/exec.rs: ########## @@ -227,9 +227,19 @@ pub(super) async fn exec_and_print( let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_ref())?; for statement in statements { + let _mem_handle = if print_options.memory_profiling { + // RAII guard: dropping the handle disables profiling after execution + Some(ctx.enable_memory_profiling()) + } else { + None + }; StatementExecutor::new(statement) .execute(ctx, print_options) .await?; + // disable after each statement Review Comment: If you are going to use a RAAI style thing I would expect the `Drop` impl to disable profiling 🤔 ########## datafusion-cli/src/print_options.rs: ########## @@ -73,6 +73,7 @@ pub struct PrintOptions { pub quiet: bool, pub maxrows: MaxRows, pub color: bool, + pub memory_profiling: bool, Review Comment: memory profiling seems like something that should be attached to the session context, as it is not related to formatting output ########## datafusion-cli/src/exec.rs: ########## @@ -54,7 +54,7 @@ use tokio::signal; pub async fn exec_from_commands( ctx: &dyn CliSessionContext, commands: Vec<String>, - print_options: &PrintOptions, + print_options: &mut PrintOptions, Review Comment: It seems strange that now PrintOptions are mutable - why would executing a command change the options? ########## datafusion/core/src/execution/context/mod.rs: ########## @@ -90,6 +90,244 @@ use chrono::{DateTime, Utc}; use object_store::ObjectStore; use parking_lot::RwLock; use url::Url; +/// Memory profiling report for a query. +#[derive(Debug, Clone, Default, PartialEq, Eq)] +pub struct MemoryReport { + metrics: std::collections::HashMap<String, usize>, +} + +impl MemoryReport { + /// Create a new [`MemoryReport`] from the provided metrics. + pub fn new(metrics: std::collections::HashMap<String, usize>) -> Self { + Self { metrics } + } + + /// Returns `true` if the report contains no metrics. + pub fn is_empty(&self) -> bool { + self.metrics.is_empty() + } + + /// Number of tracked operators in the report. + pub fn len(&self) -> usize { + self.metrics.len() + } + + /// Consume the report and return the underlying metrics. + pub fn into_inner(self) -> std::collections::HashMap<String, usize> { + self.metrics + } +} + +impl std::ops::Deref for MemoryReport { + type Target = std::collections::HashMap<String, usize>; + + fn deref(&self) -> &Self::Target { + &self.metrics + } +} + +impl From<std::collections::HashMap<String, usize>> for MemoryReport { + fn from(metrics: std::collections::HashMap<String, usize>) -> Self { + Self::new(metrics) + } +} +// Implement IntoIterator for &MemoryReport to allow iterating over &report +impl<'a> IntoIterator for &'a MemoryReport { + type Item = (&'a String, &'a usize); + type IntoIter = std::collections::hash_map::Iter<'a, String, usize>; + + fn into_iter(self) -> Self::IntoIter { + self.metrics.iter() + } +} +/// Enhanced memory profiling report with categorization and analysis +#[derive(Debug)] +pub struct EnhancedMemoryReport { Review Comment: I wonder why you chose to put this in the session/execution context and not in the existing memory pool https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/index.html In particular, I think this code has a lot of similarity to https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.TrackConsumersPool.html Perhaps we can modify that pool or add another wrapper pool for DataFusion CLI that saves the maximum consumers, etc ########## datafusion/core/src/execution/context/mod.rs: ########## @@ -90,6 +90,244 @@ use chrono::{DateTime, Utc}; use object_store::ObjectStore; use parking_lot::RwLock; use url::Url; +/// Memory profiling report for a query. +#[derive(Debug, Clone, Default, PartialEq, Eq)] +pub struct MemoryReport { + metrics: std::collections::HashMap<String, usize>, Review Comment: can we please document what this hash map contains (is it memory consumer names?) ########## datafusion-cli/src/exec.rs: ########## @@ -227,9 +227,19 @@ pub(super) async fn exec_and_print( let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_ref())?; for statement in statements { + let _mem_handle = if print_options.memory_profiling { + // RAII guard: dropping the handle disables profiling after execution Review Comment: I don't understand why this is named with a leading underscore given it is referenced below -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org