wiedld commented on code in PR #17943:
URL: https://github.com/apache/datafusion/pull/17943#discussion_r2408421334
##########
datafusion/core/tests/memory_limit/mod.rs:
##########
@@ -374,8 +393,38 @@ async fn oom_parquet_sink() {
path.to_string_lossy()
))
.with_expected_errors(vec![
- "Failed to allocate additional",
- "for ParquetSink(ArrowColumnWriter)",
+ "Resources exhausted: Additional allocation failed for
ParquetSink(ArrowColumnWriter(col=1)) with top memory consumers (across
reservations) as:
+ ParquetSink(ArrowColumnWriter(col=8))#ID(can spill: false) consumed x KB,
peak x KB:
+stack backtrace:
+ 0: ParquetSink(ArrowColumnWriter(col=8))#ID(can spill: false) consumed x
KB, peak x KB
+ 1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B,
peak x B
+ 2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+ ParquetSink(ArrowColumnWriter(col=14))#ID(can spill: false) consumed x KB,
peak x KB:
+stack backtrace:
+ 0: ParquetSink(ArrowColumnWriter(col=14))#ID(can spill: false) consumed x
KB, peak x KB
+ 1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B,
peak x B
+ 2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+ ParquetSink(ArrowColumnWriter(col=0))#ID(can spill: false) consumed x KB,
peak x KB:
+stack backtrace:
+ 0: ParquetSink(ArrowColumnWriter(col=0))#ID(can spill: false) consumed x
KB, peak x KB
+ 1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B,
peak x B
+ 2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+ ParquetSink(ArrowColumnWriter(col=2))#ID(can spill: false) consumed x KB,
peak x KB:
+stack backtrace:
+ 0: ParquetSink(ArrowColumnWriter(col=2))#ID(can spill: false) consumed x
KB, peak x KB
+ 1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B,
peak x B
+ 2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+ ParquetSink(ArrowColumnWriter(col=1))#ID(can spill: false) consumed x KB,
peak x KB:
+stack backtrace:
+ 0: ParquetSink(ArrowColumnWriter(col=1))#ID(can spill: false) consumed x
KB, peak x KB
+ 1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B,
peak x B
+ 2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+.
+Error: Failed to allocate additional x KB for
ParquetSink(ArrowColumnWriter(col=1)) with x KB already allocated for this
reservation - x KB remain available for the total pool",
Review Comment:
This is an example of using the parent/child relationship to build a trace
of consumers.
The downside of this approach is that the parent's bytes (consumed & peak)
do NOT incude all the children. This is because of how the existing memory
reservation system works; each reservation only tracks itself -- it doesn't
impact it's parent.
In fact, I'm adding the concept of a parent reservation in this PR -- in
order to create these traces. I didn't think it wise to try updating values on
parent reservations as well, since that's an expensive operation done
frequently (e.g. grow, shrink, resize).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]