joshg-ec opened a new issue, #4324:
URL: https://github.com/apache/arrow-rs/issues/4324
**Describe the bug**
`concat`, used by `concat_batches`, does not appear to allocate sufficient
`capacities` when constructing the `MutableArrayData`. Concatenating records
that contain lists of structs results in the following panic:
```
assertion failed: total_len <= bit_len
thread 'concat_test' panicked at 'assertion failed: total_len <= bit_len',
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
stack backtrace:
0: rust_begin_unwind
at
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
1: core::panicking::panic_fmt
at
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
2: core::panicking::panic
at
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:114:5
3: arrow_buffer::buffer::boolean::BooleanBuffer::new
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
4: arrow_data::transform::_MutableArrayData::freeze::{{closure}}
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:81:25
5: core::bool::<impl bool>::then
at
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/bool.rs:71:24
6: arrow_data::transform::_MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:80:21
7: arrow_data::transform::MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
8: arrow_data::transform::_MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
9: arrow_data::transform::MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
10: arrow_data::transform::_MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
11: arrow_data::transform::MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
12: arrow_data::transform::_MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
13: arrow_data::transform::MutableArrayData::freeze
at
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
```
**To Reproduce**
Call `concat_batches` with `RecordBatch`s that contain lists of structs (on
the order of 20–50 structs in the list per `RecordBatch`). If I modify [the
capacity calculation in
concat](https://github.com/apache/arrow-rs/blob/c295b172b37902d5fa41ef275ff5b86caf9fde75/arrow-select/src/concat.rs#L76-L82)
to add a constant factor for lists, the error does not occur:
```rust
let capacity = match d {
DataType::Utf8 => binary_capacity::<Utf8Type>(arrays),
DataType::LargeUtf8 => binary_capacity::<LargeUtf8Type>(arrays),
DataType::Binary => binary_capacity::<BinaryType>(arrays),
DataType::LargeBinary => binary_capacity::<LargeBinaryType>(arrays),
DataType::List(_) => {
Capacities::Array(arrays.iter().map(|a| a.len()).sum::<usize>()
+ 500) // <- 500 added here
}
_ => Capacities::Array(arrays.iter().map(|a| a.len()).sum()),
};
```
**Expected behavior**
No panics when concatenating lists.
**Additional context**
Reproduced with Arrow versions 37--40. Error does not occur in version 34.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]