etseidl commented on issue #6219: URL: https://github.com/apache/arrow-rs/issues/6219#issuecomment-2285000605
Pardon a little more spam on this, but as I dig deeper into the `FixedSizeBinaryArray->PrimitiveArray<T>` transformation, one thing I noticed is that we're creating an iterator of `Option<NativeType>`, and passing that to `from_iter` https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/arrow-array/src/array/primitive_array.rs#L1318 which iterates over the values, collecting them into a `Buffer`, while creating a null buffer as it goes. But in `FixedLenByteArrayReader::consume_batch`, we already have a null buffer in `binary` https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/parquet/src/arrow/array_reader/fixed_len_byte_array.rs#L163 so it seems a waste to create one again. I'm wondering if it would make sense here to (in the cases where we're converting from `FixedSizeBinaryArray` to `PrimitiveArray<>`) take the null buffer from `binary`, modify the iterator to return the native type rather than an `Option`, and create the `PrimitiveArray` from the iterator and the stolen null buffer. I've implemented a PoC that knocks off a good bit of time (1.0ms -> 740us for the FLBA/Decimal128 bench). ```rust ArrowType::Decimal128(p, s) => { let nb = binary.take_nulls(); let decimal = binary .iter() .map(|o| match o { Some(b) => i128::from_be_bytes(sign_extend_be(b)), None => i128::default(), }); let decimal = Decimal128Array::from_iter_values_with_nulls(decimal, nb) .with_precision_and_scale(*p, *s)?; Arc::new(decimal) } ``` Am I missing something subtle (not out of the question...this is my 5th attempt or so) that would break this? In particular, is it safe to assume the null buffer in `binary` would be the same as computed in `from_iter()`? @tustvold @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
