alamb commented on code in PR #17021:
URL: https://github.com/apache/datafusion/pull/17021#discussion_r2263831579
##########
datafusion-cli/src/exec.rs:
##########
@@ -227,9 +227,19 @@ pub(super) async fn exec_and_print(
let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_ref())?;
for statement in statements {
+ let _mem_handle = if print_options.memory_profiling {
+ // RAII guard: dropping the handle disables profiling after
execution
+ Some(ctx.enable_memory_profiling())
+ } else {
+ None
+ };
StatementExecutor::new(statement)
.execute(ctx, print_options)
.await?;
+ // disable after each statement
Review Comment:
If you are going to use a RAAI style thing I would expect the `Drop` impl to
disable profiling 🤔
##########
datafusion-cli/src/print_options.rs:
##########
@@ -73,6 +73,7 @@ pub struct PrintOptions {
pub quiet: bool,
pub maxrows: MaxRows,
pub color: bool,
+ pub memory_profiling: bool,
Review Comment:
memory profiling seems like something that should be attached to the session
context, as it is not related to formatting output
##########
datafusion-cli/src/exec.rs:
##########
@@ -54,7 +54,7 @@ use tokio::signal;
pub async fn exec_from_commands(
ctx: &dyn CliSessionContext,
commands: Vec<String>,
- print_options: &PrintOptions,
+ print_options: &mut PrintOptions,
Review Comment:
It seems strange that now PrintOptions are mutable - why would executing a
command change the options?
##########
datafusion/core/src/execution/context/mod.rs:
##########
@@ -90,6 +90,244 @@ use chrono::{DateTime, Utc};
use object_store::ObjectStore;
use parking_lot::RwLock;
use url::Url;
+/// Memory profiling report for a query.
+#[derive(Debug, Clone, Default, PartialEq, Eq)]
+pub struct MemoryReport {
+ metrics: std::collections::HashMap<String, usize>,
+}
+
+impl MemoryReport {
+ /// Create a new [`MemoryReport`] from the provided metrics.
+ pub fn new(metrics: std::collections::HashMap<String, usize>) -> Self {
+ Self { metrics }
+ }
+
+ /// Returns `true` if the report contains no metrics.
+ pub fn is_empty(&self) -> bool {
+ self.metrics.is_empty()
+ }
+
+ /// Number of tracked operators in the report.
+ pub fn len(&self) -> usize {
+ self.metrics.len()
+ }
+
+ /// Consume the report and return the underlying metrics.
+ pub fn into_inner(self) -> std::collections::HashMap<String, usize> {
+ self.metrics
+ }
+}
+
+impl std::ops::Deref for MemoryReport {
+ type Target = std::collections::HashMap<String, usize>;
+
+ fn deref(&self) -> &Self::Target {
+ &self.metrics
+ }
+}
+
+impl From<std::collections::HashMap<String, usize>> for MemoryReport {
+ fn from(metrics: std::collections::HashMap<String, usize>) -> Self {
+ Self::new(metrics)
+ }
+}
+// Implement IntoIterator for &MemoryReport to allow iterating over &report
+impl<'a> IntoIterator for &'a MemoryReport {
+ type Item = (&'a String, &'a usize);
+ type IntoIter = std::collections::hash_map::Iter<'a, String, usize>;
+
+ fn into_iter(self) -> Self::IntoIter {
+ self.metrics.iter()
+ }
+}
+/// Enhanced memory profiling report with categorization and analysis
+#[derive(Debug)]
+pub struct EnhancedMemoryReport {
Review Comment:
I wonder why you chose to put this in the session/execution context and not
in the existing memory pool
https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/index.html
In particular, I think this code has a lot of similarity to
https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.TrackConsumersPool.html
Perhaps we can modify that pool or add another wrapper pool for DataFusion
CLI that saves the maximum consumers, etc
##########
datafusion/core/src/execution/context/mod.rs:
##########
@@ -90,6 +90,244 @@ use chrono::{DateTime, Utc};
use object_store::ObjectStore;
use parking_lot::RwLock;
use url::Url;
+/// Memory profiling report for a query.
+#[derive(Debug, Clone, Default, PartialEq, Eq)]
+pub struct MemoryReport {
+ metrics: std::collections::HashMap<String, usize>,
Review Comment:
can we please document what this hash map contains (is it memory consumer
names?)
##########
datafusion-cli/src/exec.rs:
##########
@@ -227,9 +227,19 @@ pub(super) async fn exec_and_print(
let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_ref())?;
for statement in statements {
+ let _mem_handle = if print_options.memory_profiling {
+ // RAII guard: dropping the handle disables profiling after
execution
Review Comment:
I don't understand why this is named with a leading underscore given it is
referenced below
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]