Kontinuation commented on issue #15028:
URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2702915602
I have tried the repro. This is more like a problem of Parquet writer, and
not strongly related to sorting.
I made small tweaks to the repro code to expose the status of memory
consumers and the backtrace, the failure I got was:
```
Error: Resources exhausted: Additional allocation failed with top memory
consumers (across reservations) as:
ParquetSink(ArrowColumnWriter) consumed 65911731 bytes (62.8 MB),
ExternalSorter[0] consumed 23587552 bytes (22.4 MB),
ExternalSorterMerge[0] consumed 14261080 bytes (13.6 MB),
ParquetSink(SerializedFileWriter) consumed 0 bytes.
Error: Failed to allocate additional 1450451 bytes for
ParquetSink(ArrowColumnWriter) with 62770337 bytes already allocated for this
reservation - 1097237 bytes remain available for the total pool
```
I've slightly reformatted the error message to make it more readable. The
backtrace is:
```
backtrace: 0: std::backtrace_rs::backtrace::libunwind::trace
at
/rustc/30f168ef811aec63124eac677e14699baa9395bd/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
1: std::backtrace_rs::backtrace::trace_unsynchronized
at
/rustc/30f168ef811aec63124eac677e14699baa9395bd/library/std/src/../../backtrace/src/backtrace/mod.rs:66:14
2: std::backtrace::Backtrace::create
at
/rustc/30f168ef811aec63124eac677e14699baa9395bd/library/std/src/backtrace.rs:331:13
3: datafusion_common::error::DataFusionError::get_back_trace
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-common-45.0.0/src/error.rs:410:30
4: datafusion_execution::memory_pool::pool::insufficient_capacity_err
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/pool.rs:249:5
5: <datafusion_execution::memory_pool::pool::FairSpillPool as
datafusion_execution::memory_pool::MemoryPool>::try_grow
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/pool.rs:220:32
6: <datafusion_execution::memory_pool::pool::TrackConsumersPool<I> as
datafusion_execution::memory_pool::MemoryPool>::try_grow
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/pool.rs:362:9
7: datafusion_execution::memory_pool::MemoryReservation::try_grow
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/mod.rs:298:9
8: datafusion_execution::memory_pool::MemoryReservation::try_resize
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-execution-45.0.0/src/memory_pool/mod.rs:281:34
9:
datafusion::datasource::file_format::parquet::column_serializer_task::{{closure}}
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-45.0.0/src/datasource/file_format/parquet.rs:900:9
10: <core::pin::Pin<P> as core::future::future::Future>::poll
at
/Users/bopeng/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9
11: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/runtime/task/core.rs:331:17
12: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/loom/std/unsafe_cell.rs:16:9
13: tokio::runtime::task::core::Core<T,S>::poll
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/runtime/task/core.rs:320:13
14: tokio::runtime::task::harness::poll_future::{{closure}}
at
/Users/bopeng/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.43.0/src/runtime/task/harness.rs:532:19
15: <core::panic::unwind_safe::AssertUnwindSafe<F> as
core::ops::function::FnOnce<()>>::call_once
at
/Users/bopeng/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
16: std::panicking::try::do_call
at
/Users/bopeng/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:587:40
17: ___rust_try
...
```
Parquet writer consumed most of the memory and triggered the allocation
failure. The memory reserved is too small for Parquet writer to hold row groups
in memory before flushing out to disk.
This issue can also be reproduced without sorting. I tried replacing `let
sorted = df.sort(...)` with `let sorted = df`. The error message is:
```
Error: Resources exhausted: Additional allocation failed with top memory
consumers (across reservations) as:
ParquetSink(ArrowColumnWriter) consumed 104209999 bytes (99.4 MB),
ParquetSink(SerializedFileWriter) consumed 0 bytes.
Error: Failed to allocate additional 1253843 bytes for
ParquetSink(ArrowColumnWriter) with 99576954 bytes already allocated for this
reservation - 647601 bytes remain available for the total pool
```
I've tried setting a smaller `max_row_group_size` to reduce the amount of
memory required by ParquetSink, then the query finished successfully:
```rust
table_opts.global.max_row_group_size = 1000;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]