etseidl opened a new issue, #6296: URL: https://github.com/apache/arrow-rs/issues/6296
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When reading a Parquet file with FIXED_LEN_BYTE_ARRAY columns with nulls present one necessary operation is moving the fixed-length data into the correct location within the output buffer to take into account null slots. This is handled by the [`pad_nulls`](https://github.com/apache/arrow-rs/blob/8c956a9f9ab26c14072740cce64c2b99cb039b13/parquet/src/arrow/array_reader/fixed_len_byte_array.rs#L237) function in the `ValuesBuffer` trait. The inner loop of this function ```rust for i in 0..byte_length { self.buffer[level_pos_bytes + i] = self.buffer[value_pos_bytes + i] } ``` works well when the fixed width is low (`<= 4`), but for larger widths this loop is quite inefficient. **Describe the solution you'd like** Rewriting the inner loop for longer fixed-size arrays can speed this operation up considerably. In particular, by copying slices of the buffer to another location in the buffer, the compiler can vectorize the move, e.g. ```rust let split = self.buffer.split_at_mut(level_pos_bytes); let dst = &mut split.1[..byte_length]; let src = &split.0[value_pos_bytes..value_pos_bytes + byte_length]; for i in 0..byte_length { dst[i] = src[i] } ``` **Describe alternatives you've considered** I tried [`Vec::copy_within`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.copy_within) but it was slower than the vectorized copy. **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
