alamb commented on code in PR #6244:
URL: https://github.com/apache/arrow-rs/pull/6244#discussion_r1715925061
##########
parquet/src/arrow/array_reader/fixed_len_byte_array.rs:
##########
@@ -165,57 +165,65 @@ impl ArrayReader for FixedLenByteArrayReader {
// TODO: An improvement might be to do this conversion on read
let array: ArrayRef = match &self.data_type {
ArrowType::Decimal128(p, s) => {
- let decimal = binary
- .iter()
- .map(|opt| Some(i128::from_be_bytes(sign_extend_be(opt?))))
- .collect::<Decimal128Array>()
+ let nulls = binary.nulls().cloned();
+ let decimal = binary.iter().map(|o| match o {
Review Comment:
I feel like there may be some further room for improvement here and avoid
the branch in the inner loop (just apply
`i128::from_be_bytes(sign_extend_be(b)),` directly to the values rather than
having to check each element
##########
parquet/src/arrow/array_reader/fixed_len_byte_array.rs:
##########
@@ -165,57 +165,65 @@ impl ArrayReader for FixedLenByteArrayReader {
// TODO: An improvement might be to do this conversion on read
let array: ArrayRef = match &self.data_type {
ArrowType::Decimal128(p, s) => {
- let decimal = binary
- .iter()
- .map(|opt| Some(i128::from_be_bytes(sign_extend_be(opt?))))
- .collect::<Decimal128Array>()
+ let nulls = binary.nulls().cloned();
Review Comment:
I think a comment explaining that the nulls have already been handled and
avoiding re-creating the nulls will improve the performance
##########
parquet/src/arrow/array_reader/fixed_len_byte_array.rs:
##########
@@ -165,57 +165,65 @@ impl ArrayReader for FixedLenByteArrayReader {
// TODO: An improvement might be to do this conversion on read
let array: ArrayRef = match &self.data_type {
ArrowType::Decimal128(p, s) => {
- let decimal = binary
- .iter()
- .map(|opt| Some(i128::from_be_bytes(sign_extend_be(opt?))))
- .collect::<Decimal128Array>()
+ let nulls = binary.nulls().cloned();
+ let decimal = binary.iter().map(|o| match o {
Review Comment:
Or maybe we could implement something like `PrimitiveArray::unary_mut` for
FixedLengthByteArray to transform the bytes
https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary_mut
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]