alamb commented on code in PR #17943:
URL: https://github.com/apache/datafusion/pull/17943#discussion_r2411811405
##########
datafusion/core/tests/memory_limit/mod.rs:
##########
@@ -885,15 +941,30 @@ impl TestCase {
"Unexpected failure when running, expected success but
got: {e}"
)
} else {
+ let err_msg = normalize_oom_errors(&e.to_string());
for error_substring in expected_errors {
- assert_contains!(e.to_string(), error_substring);
+ assert_contains!(&err_msg, error_substring);
}
}
}
}
}
}
+fn normalize_oom_errors(err: &str) -> String {
Review Comment:
this is pretty clever
##########
datafusion/execution/src/memory_pool/mod.rs:
##########
@@ -278,6 +281,17 @@ impl MemoryConsumer {
name: name.into(),
can_spill: false,
id: Self::new_unique_id(),
+ parent_id: None,
+ }
+ }
+
+ /// Create a new [`MemoryConsumer`] with a parent consumer ID for lineage
tracking
+ pub fn new_with_parent(name: impl Into<String>, parent_id: usize) -> Self {
+ Self {
+ name: name.into(),
+ can_spill: false,
Review Comment:
Rather than add a bunch of new APIs for creating new with parent id, how
about just adding a method like
```shell
fn with_parent_id(mut self, parent_id: ..) -> Self {
..
}
```
That mirrors the other fields?
That might also make the defaults less confusing
##########
datafusion/execution/src/memory_pool/mod.rs:
##########
@@ -240,6 +240,7 @@ pub struct MemoryConsumer {
name: String,
can_spill: bool,
id: usize,
+ parent_id: Option<usize>,
Review Comment:
it would be nice to expand out the memory consumer documentation to mention
how the parent/child relationship was used
##########
datafusion/execution/src/memory_pool/mod.rs:
##########
@@ -462,6 +482,32 @@ impl MemoryReservation {
}
}
+ /// Create a new [`MemoryReservation`] with a new [`MemoryConsumer] that
+ /// is a child of this reservation's consumer.
+ ///
+ /// This is useful for creating memory consumers with lineage tracking.
+ pub fn new_child_reservation(
+ &self,
+ name: impl Into<String>,
+ can_spill: bool,
Review Comment:
I am not sure about always passing `can_spill` here -- I think we can use
the existing `with_can_spill` API instead of forcing a bunch of `bool`
parameters
##########
datafusion/execution/src/memory_pool/mod.rs:
##########
@@ -462,6 +482,32 @@ impl MemoryReservation {
}
}
+ /// Create a new [`MemoryReservation`] with a new [`MemoryConsumer] that
+ /// is a child of this reservation's consumer.
+ ///
+ /// This is useful for creating memory consumers with lineage tracking.
+ pub fn new_child_reservation(
+ &self,
+ name: impl Into<String>,
+ can_spill: bool,
+ ) -> MemoryReservation {
+ MemoryConsumer::new_with_parent(name, self.consumer().id())
+ .with_can_spill(can_spill)
+ .register(&self.registration.pool)
+ }
+
+ /// Create a new [`MemoryReservation`] which is a clone to the current
+ /// [`MemoryReservation`]. This means that it's a cloned [`MemoryConsumer`]
+ /// with the same configuration, but a new unique ID.
+ ///
+ /// This is useful for creating memory consumers with lineage tracking,
+ /// while dealing with multithreaded scenarios.
+ pub fn cloned_reservation(&self) -> MemoryReservation {
Review Comment:
It is strange to me that `clone_with_new_id` doesn't also register the
reservation 🤔
As now users have two APIs to think about and figure out which to use
Could we unify them somehow (e.g. maybe deprecate `clone_with_new_id` (as a
follow on PR)?)
##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -1301,7 +1303,9 @@ impl FileSink for ParquetSink {
let props = parquet_props.clone();
let skip_arrow_metadata =
self.parquet_options.global.skip_arrow_metadata;
let parallel_options_clone = parallel_options.clone();
- let pool = Arc::clone(context.memory_pool());
+ // Create a reservation for the parallel parquet writing
+ let reservation =
MemoryConsumer::new("ParquetSink(ParallelWriter)")
Review Comment:
should this be a new child reservation of the `reservation` above ? Or
maybe I am misreading the diff and it already is
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]