etseidl commented on issue #6219:
URL: https://github.com/apache/arrow-rs/issues/6219#issuecomment-2285000605

   Pardon a little more spam on this, but as I dig deeper into the 
`FixedSizeBinaryArray->PrimitiveArray<T>` transformation, one thing I noticed 
is that we're creating an iterator of `Option<NativeType>`, and passing that to 
`from_iter` 
https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/arrow-array/src/array/primitive_array.rs#L1318
 which iterates over the values, collecting them into a `Buffer`, while 
creating a null buffer as it goes. But in 
`FixedLenByteArrayReader::consume_batch`, we already have a null buffer in 
`binary` 
https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/parquet/src/arrow/array_reader/fixed_len_byte_array.rs#L163
 so it seems a waste to create one again.
   
   I'm wondering if it would make sense here to (in the cases where we're 
converting from `FixedSizeBinaryArray` to `PrimitiveArray<>`) take the null 
buffer from `binary`, modify the iterator to return the native type rather than 
an `Option`,  and create the `PrimitiveArray` from the iterator and the stolen 
null buffer. I've implemented a PoC that knocks off a good bit of time (1.0ms 
-> 740us for the FLBA/Decimal128 bench).
   ```rust
               ArrowType::Decimal128(p, s) => {
                   let nb = binary.take_nulls();
                   let decimal = binary
                       .iter()
                       .map(|o| match o {
                           Some(b) => i128::from_be_bytes(sign_extend_be(b)),
                           None => i128::default(),
                       });
                   let decimal = 
Decimal128Array::from_iter_values_with_nulls(decimal, nb)
                       .with_precision_and_scale(*p, *s)?;
                   Arc::new(decimal)
               }
   ```
   Am I missing something subtle (not out of the question...this is my 5th 
attempt or so) that would break this? In particular, is it safe to assume the 
null buffer in `binary` would be the same as computed in `from_iter()`?
   @tustvold  @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to