tijmenr opened a new issue, #5445:
URL: https://github.com/apache/arrow-rs/issues/5445

   **Describe the bug**
   In: 
https://github.com/apache/arrow-rs/blob/db811083669df66992008c9409b743a2e365adb0/arrow-buffer/src/builder/null.rs#L57
   
   The assertion check is `len < capacity`, where `len` is the number of 
boolean null/non-null bit values, and `capacity` is `buffer.len() * 8`, with 
`buffer.len()` the size of the buffer containing those bits. If `len` is a 
multiple of 8, say `8b`, than the buffer used to store it has length 
(`buffer.len()`) `b`, and `len == capacity`. This valid situation fails the 
assertion check.
   
   **To Reproduce**
   ```rust
   // Assume pa: PrimitiveArray holding values including nulls, with length a 
multiple of 8
   let b = pa.into_builder().expect("into_builder")
   ```
   > thread 'main' panicked at 
.../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-50.0.0/src/builder/null.rs:57:9:
     assertion failed: len < capacity
     stack backtrace:
      ...
      2: core::panicking::panic
                at 
/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:127:5
      3: arrow_buffer::builder::null::NullBufferBuilder::new_from_buffer
      4: 
arrow_array::builder::primitive_builder::PrimitiveBuilder<T>::new_from_buffer::{{closure}}
                at 
.../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-50.0.0/src/builder/primitive_builder.rs:164:27
      5: core::option::Option<T>::map
                at 
/rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/option.rs:1072:29
      6: 
arrow_array::builder::primitive_builder::PrimitiveBuilder<T>::new_from_buffer
                at 
.../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-50.0.0/src/builder/primitive_builder.rs:163:35
      7: arrow_array::array::primitive_array::PrimitiveArray<T>::into_builder
                at 
.../.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-50.0.0/src/array/primitive_array.rs:946:46
   
   
   **Expected behavior**
   `into_builder()` succeeds.
   
   **Additional context**
   In my use case, I am parsing a text file containing number values (or some 
value indicating "null"), and store these in a PrimitiveBuilder (for later 
converting to an Array, and further processing). I do not know in advance what 
number type a series of values has, they might for example all be integers, but 
it could also be that the first (few) ones look like integers, but later ones 
are floats. So, I start out with e.g. a PrimitiveBuilder<Int32Type>, parse the 
next string value into an int, if it parses ok I add the value, if the parse 
fails I try to parse it as e.g. an f32. If that succeeds, then apparently this 
series of values is actually a series of floats, not ints. So I want to 
basically cast/convert the builder with the already collected values to another 
type, and continue my parsing.
   
   One approach I tried:
   
   ```rust
   use num::cast::{AsPrimitive, NumCast};
   
   trait Caster<U> {
       fn cast(&mut self) -> PrimitiveBuilder<U> where U: ArrowPrimitiveType, 
U::Native: NumCast; 
   }
   
   impl<T, U> Caster<U> for PrimitiveBuilder<T>
   where T: ArrowPrimitiveType,  U: ArrowPrimitiveType, T::Native: 
AsPrimitive<U::Native>
   {
       fn cast(&mut self) -> PrimitiveBuilder<U> {
           let src_array = self.finish();
           let dst_array = src_array.unary::<_, 
U>(AsPrimitive::<U::Native>::as_);
           src_array.into_builder().expect("Converting array to builder")
       }
   }
   ```
   
   Here, if the source contained nulls, the `into_builder` returns an Err(...), 
I assume related to how the `unary` function clones the NullBuffer. It would be 
nice if `into_builder` handles this better, by creating a new null buffer if it 
cannot reuse the existing one.
   
   The next approach I tried:
   
   ```rust
   use num::cast::NumCast;
   
   trait Caster<U> {
       fn _cast(&mut self) -> PrimitiveBuilder<U> where U: ArrowPrimitiveType, 
U::Native: NumCast;
   }
   
   impl<T, U> Caster<U> for PrimitiveBuilder<T>
   where T: ArrowPrimitiveType, T::Native: NumCast, U: ArrowPrimitiveType, 
U::Native: NumCast
   {
       fn cast(&mut self) -> PrimitiveBuilder<U> {
           let src_array = self.finish();
           let dst_array = src_array.unary_opt::<_, 
U>(num::cast::cast::<T::Native, U::Native>);
           src_array.into_builder().expect("Converting array to builder")
       }
   }
   ```
   
   This panics due to the `len < capacity` assertion, if the source contains 
nulls and has a length divisible by 8.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to