tijmenr opened a new issue, #5445: URL: https://github.com/apache/arrow-rs/issues/5445
**Describe the bug** In: https://github.com/apache/arrow-rs/blob/db811083669df66992008c9409b743a2e365adb0/arrow-buffer/src/builder/null.rs#L57 The assertion check is `len < capacity`, where `len` is the number of boolean null/non-null bit values, and `capacity` is `buffer.len() * 8`, with `buffer.len()` the size of the buffer containing those bits. If `len` is a multiple of 8, say `8b`, than the buffer used to store it has length (`buffer.len()`) `b`, and `len == capacity`. This valid situation fails the assertion check. **To Reproduce** ```rust // Assume pa: PrimitiveArray holding values including nulls, with length a multiple of 8 let b = pa.into_builder().expect("into_builder") ``` > thread 'main' panicked at .../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-50.0.0/src/builder/null.rs:57:9: assertion failed: len < capacity stack backtrace: ... 2: core::panicking::panic at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:127:5 3: arrow_buffer::builder::null::NullBufferBuilder::new_from_buffer 4: arrow_array::builder::primitive_builder::PrimitiveBuilder<T>::new_from_buffer::{{closure}} at .../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-50.0.0/src/builder/primitive_builder.rs:164:27 5: core::option::Option<T>::map at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/option.rs:1072:29 6: arrow_array::builder::primitive_builder::PrimitiveBuilder<T>::new_from_buffer at .../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-50.0.0/src/builder/primitive_builder.rs:163:35 7: arrow_array::array::primitive_array::PrimitiveArray<T>::into_builder at .../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-50.0.0/src/array/primitive_array.rs:946:46 **Expected behavior** `into_builder()` succeeds. **Additional context** In my use case, I am parsing a text file containing number values (or some value indicating "null"), and store these in a PrimitiveBuilder (for later converting to an Array, and further processing). I do not know in advance what number type a series of values has, they might for example all be integers, but it could also be that the first (few) ones look like integers, but later ones are floats. So, I start out with e.g. a PrimitiveBuilder<Int32Type>, parse the next string value into an int, if it parses ok I add the value, if the parse fails I try to parse it as e.g. an f32. If that succeeds, then apparently this series of values is actually a series of floats, not ints. So I want to basically cast/convert the builder with the already collected values to another type, and continue my parsing. One approach I tried: ```rust use num::cast::{AsPrimitive, NumCast}; trait Caster<U> { fn cast(&mut self) -> PrimitiveBuilder<U> where U: ArrowPrimitiveType, U::Native: NumCast; } impl<T, U> Caster<U> for PrimitiveBuilder<T> where T: ArrowPrimitiveType, U: ArrowPrimitiveType, T::Native: AsPrimitive<U::Native> { fn cast(&mut self) -> PrimitiveBuilder<U> { let src_array = self.finish(); let dst_array = src_array.unary::<_, U>(AsPrimitive::<U::Native>::as_); src_array.into_builder().expect("Converting array to builder") } } ``` Here, if the source contained nulls, the `into_builder` returns an Err(...), I assume related to how the `unary` function clones the NullBuffer. It would be nice if `into_builder` handles this better, by creating a new null buffer if it cannot reuse the existing one. The next approach I tried: ```rust use num::cast::NumCast; trait Caster<U> { fn _cast(&mut self) -> PrimitiveBuilder<U> where U: ArrowPrimitiveType, U::Native: NumCast; } impl<T, U> Caster<U> for PrimitiveBuilder<T> where T: ArrowPrimitiveType, T::Native: NumCast, U: ArrowPrimitiveType, U::Native: NumCast { fn cast(&mut self) -> PrimitiveBuilder<U> { let src_array = self.finish(); let dst_array = src_array.unary_opt::<_, U>(num::cast::cast::<T::Native, U::Native>); src_array.into_builder().expect("Converting array to builder") } } ``` This panics due to the `len < capacity` assertion, if the source contains nulls and has a length divisible by 8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
