joshg-ec opened a new issue, #4324:
URL: https://github.com/apache/arrow-rs/issues/4324

   **Describe the bug**
   `concat`, used by `concat_batches`, does not appear to allocate sufficient 
`capacities` when constructing the `MutableArrayData`. Concatenating records 
that contain lists of structs results in the following panic:
   ```
   assertion failed: total_len <= bit_len
   thread 'concat_test' panicked at 'assertion failed: total_len <= bit_len', 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
      1: core::panicking::panic_fmt
                at 
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
      2: core::panicking::panic
                at 
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:114:5
      3: arrow_buffer::buffer::boolean::BooleanBuffer::new
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
      4: arrow_data::transform::_MutableArrayData::freeze::{{closure}}
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:81:25
      5: core::bool::<impl bool>::then
                at 
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/bool.rs:71:24
      6: arrow_data::transform::_MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:80:21
      7: arrow_data::transform::MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
      8: arrow_data::transform::_MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
      9: arrow_data::transform::MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
     10: arrow_data::transform::_MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
     11: arrow_data::transform::MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
     12: arrow_data::transform::_MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
     13: arrow_data::transform::MutableArrayData::freeze
                at 
/Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
   ```
   
   **To Reproduce**
   Call `concat_batches` with `RecordBatch`s that contain lists of structs (on 
the order of 20–50 structs in the list per `RecordBatch`). If I modify [the 
capacity calculation in 
concat](https://github.com/apache/arrow-rs/blob/c295b172b37902d5fa41ef275ff5b86caf9fde75/arrow-select/src/concat.rs#L76-L82)
 to add a constant factor for lists, the error does not occur:
   ```rust
       let capacity = match d {
           DataType::Utf8 => binary_capacity::<Utf8Type>(arrays),
           DataType::LargeUtf8 => binary_capacity::<LargeUtf8Type>(arrays),
           DataType::Binary => binary_capacity::<BinaryType>(arrays),
           DataType::LargeBinary => binary_capacity::<LargeBinaryType>(arrays),
           DataType::List(_) => {
               Capacities::Array(arrays.iter().map(|a| a.len()).sum::<usize>() 
+ 500) // <- 500 added here
           }
           _ => Capacities::Array(arrays.iter().map(|a| a.len()).sum()),
       };
   ```
   
   **Expected behavior**
   No panics when concatenating lists.
   
   **Additional context**
   Reproduced with Arrow versions 37--40. Error does not occur in version 34.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to